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Abstract 

In this paper, we propose two new methods to estimate the dimension- 
reduction directions of the central subspace (CS) by constructing a regression 
model such that the directions are all captured in the regression mean. Com- 
pared with the inverse regression estimation methods (e.g. Li, 1991, 1992; 
Cook and Weisberg, 1991), the new methods require no strong assumptions 
on the design of covariates or the functional relation between regressors and 
the response variable, and have better performance than the inverse regression 
estimation methods for finite samples. Compared with the direct regression 
estimation methods (e.g. Hardle and Stoker, 1989; Hristache, Juditski, Polzehl 
and Spokoiny, 2001; Xia, Tong, Li and Zhu, 2002), which can only estimate the 
directions of CS in the regression mean, the new methods can detect the direc- 
tions of CS exhaustively. Consistency of the estimators and the convergence of 
corresponding algorithms are proved. 
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1 Introduction 



Suppose X is a random vector in MP and F is a univariate random variable. Let 
-^0 = (/?oi) ■ ■ ■ jPoq) denote a p x q orthogonal matrix with q < p, i.e. BqBq = Iq, 
where Iq is a q x q identity matrix. Given B^X, if Y and X are independent, 
i.e. Y X X\BqX, then the space spanned by the column vectors /?oi)/?02, • • • ,Poq, 
S{Bo), is called the dimension reduction space. If all the other dimension reduction 
spaces include S{Bo) as their subspace, then S{Bo) is the so-called central dimension 
reduction subspace (CS); see Cook (1998). The column vectors /3oij /3o2, • • • , Poq are 
called the CS directions. Dimension reduction is a fundamental statistical problem 
both in theory and in practice. See Li (1991, 1992) and Cook (1998) for more 
discussion. If the conditional density function of y given X exists, then the definition 
is equivalent to the conditional density function of Y\X being the same as that of 
Y\BqX for all possible values of X and Y, i.e. 

/m(y|^) = 4|,T,(2/l4^)- (1-1) 

Other alternative definitions for the dimension reduction space can be found in the 
literature. 

In the last decade or so, a series of papers (e.g. Hardle and Stocker, 1989; Li, 
1991; Cook and Weisberg, 1991; Samarov, 1993; Hristache, Juditski, Polzehl and 
Spokoiny, 2001; Yin and Cook, 2002; Xia, Tong, Li and Zhu, 2002; Cook and Li, 
2002; Li, Zha and Chiaromontc, 2004; Luc, 2004) have considered issues related to 
the dimension reduction problem, with the aim of estimating the dimension reduc- 
tion space and relevant functions. The estimation methods in the literature can be 
classified into two groups: inverse regression estimation methods (e.g. SIR, Li, 1991 
and SAVE, Cook and Weisberg, 1991) and direct regression estimation methods 
(e.g. ADE, Hardle and Stoker, 1991 and MAVE of Xia, Tong, Li and Zhu 2002). 
The inverse regression estimation methods are computationally easy and are widely 
used as an initial step in data mining, especially for large data sets. However, these 
methods have poor performance in finite samples and need strong assumptions on 
the design of covariates. The direct regression estimation methods have much bet- 
ter performance for finite samples than the inverse regression estimations. They 
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need no strong requirements on the design of covariates or on the response variable. 
However, the direct regression estimation methods cannot find the directions in CS 
exhaustively, such as those in the conditional variance. 

None of the methods mentioned above use the definitions directly in searching 
for the central space. As a consequence, tlicy fail in one way or another to estimate 
CS efficiently. A straightforward approach in using definition (1.1) is to look for Bq 
in order to minimize the difference between those two conditional density functions. 
The conditional density functions can be estimated using nonparametric smoothers. 
Obviously, this approach is not efficient in theory due to the "curse of dimension- 
ality" in nonparametric smoothing. In calculations, the minimization problem is 
difficult to implement. People have observed that the CS in the regression mean 
function, i.e. the central mean space (CMS), can be estimated much more efficiently 
than the general CS. See, for example, Yin and Cook (2002), Cook and Li (2002) 
and Xia, Tong, Li and Zhu (2002) . Motivated by this observation, one can construct 
a regression model such that the CS coincides with the CMS space in order to re- 
duce the difficulty of estimation. In this paper, wc first construct a regression model 
in which the conditional density function fY\x{y\^) is asymptotically equal to the 
conditional mean function. Then, we apply the methods of searching for the CMS to 
the constructed model. Based on the discussion above, this constructive approach 
is expected to be more efficient than the inverse regression estimation methods for 
finite samples, and can detect the CS directions exhaustively. 

In the estimation of dimension reduction space, most methods need in one way or 
another to deal with nonparametric estimation. In terms of nonparametric estima- 
tion, the inverse regression estimation methods employ a nonparametric regression 
of X on y while the direct regression estimation methods employ a nonparametric 
regression of Y on X. In contrast to existing methods, the methods in this pa- 
per search for CS from both sides by investigating conditional density functions. 
A similar idea appeared in Yin and Cook (2005) for a general single-index model. 
To overcome the difficulties of calculation, we propose two algorithms in this pa- 
per using a similar idea to Xia, Tong, Li and Zhu (2002). The algorithm solves 
the minimization problem in the method by treating it as two separate quadratic 
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programming problems, which have simple analytic solutions and can be calculated 
quite efficiently. The convergence of the algorithm can be proved. Our constructive 
approach can overcome the disadvantages both in inverse regression estimations, 
requiring a symmetric design for explanatory variables, and also the disadvantage 
in direct regression estimation, of not finding the CS directions exhaustively. Sim- 
ulations suggest that the proposed methods have very good performance for finite 
samples and are able to estimate the CS directions in very complicated structures. 
Applying the proposed methods to two real data sets, some useful patterns have 
been observed, based on the estimations. 

To estimate the CS, we need to estimate the directions Bq as well as the di- 
mension q of the space. In this paper, however, we focus on the estimation of the 
directions by assuming that q is known. 

2 Estimation methods 

As discussed above, the direct regression estimations have good performance for 
finite samples. However, it cannot detect exhaustively the CS directions in com- 
plicated structures. Motivated by these facts, our strategy is to construct a semi- 
parametric regression model such that all the CS directions are captured in the 
regression mean function. As we can see from (1.1), all the directions can be cap- 
tured in the conditional density function. Thus, we will construct a regression model 
such that the conditional density function is asymptotically equal to the regression 
mean function. 

The primary step is thus to construct an estimate for the conditional density 
function. Here, we use the idea of the "double-kernel" local linear smoothing method 
studied in Fan et al (1996) for the estimation. Consider H{,{Y — y) with y running 
through all possible values, where H(v) is a symmetric density function, & > is a 
bandwidth and Hi,{v) = h~^H{y/h). If 6 — > as n — oo, then from (1.1) we have 

m,{x,y) =^ E{H,{Y - y)\X = x) = E{H,{Y - y)\BlX = bIx) ^ /y|sTx(yl4^)- 

See Fan et al (1996). The above equation indicates that all the directions can be 
captured by the conditional mean function mb{x,y) of Hb(Y — y) on X = x with 
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X and y running through all possible values. Now, consider a regression model 
nominally for H}j{Y — y) as 

Hb{Y -y) = m,{X,y)+eMX), 

where = H^{Y ^y)-E{Hb{Y -y)\X) withEeMX) = 0. Let y) = 

E{Hb{Y - y)\BlX = B^x). If (1.1) holds, then mb{x,y) = gb{B^x,y). The model 
can be written as 

Hb{Y -y) = gb{BjX, y) + e,iy\X). (2.1) 

As 6 ^ 0, we have gj,{BQX,y) —> IyibJ xivlBo x). Thus, the directions Bq defined 
in (1.1) are all captured in the regression mean function in model (2.1) if y runs 
through all possible values. 

Based on model (2.1), we propose two methods to estimate the directions. One 
of the methods is a combination of the outer product of gradients (OPG) estima- 
tion method (Hardle, 1991; Samarov, 1993; Xia, Tong, Li and Zhu, 2002) with the 
"double-kernel" local linear smoothing method (Fan et al, 1996). The other one is 
a combination of the minimum average (conditional) variance estimation (MAVE) 
method (Xia, Tong, Li and Zhu, 2002) with the "double-kernel" local linear smooth- 
ing method. The structure adaptive weights in Hristache, Juditski and Spokoiny 
(2001) and Hristache, Juditski, Polzehl and Spokoiny (2001) are used in the estima- 
tions. 

2.1 Estimation beised on outer products of gradients 

Consider the gradient of the conditional mean function mb{x, y) with respect to x. 
If (1.1) holds, then it follows 



dini,(x,y) _ dgb{B^^^ x,y) 



Bo^gb{B^x,y), (2.2) 



dx dx 

where \/g^{vi, ■■■,Vq,y) = {\/ig^{vi, ■ ■ ■ ,Vq,y),- ■ ■ , S/ggti'"!^ ■■■,Vq, y)V with 

Vkg^ivi,--- ,Vq,y) = -K—gb{vi,---,Vq,y), k = l,2,---,q. 

Thus, the directions Bq are contained in the gradients of the regression mean func- 
tion in model (2.1). One way to estimate Bq is by considering the expectation of 



the outer product of the gradients 

It is easy to see that Bq is in the space spanned by the first q eigenvectors of the 
expectation of tlic outer products. 

Suppose that {{Xi,Yi},i = 1,2,- ■■n} is a random sample from {X,Y). To 
estimate the gradient dmi,{x,y)/dx, we can use the nonparametric kernel smoothing 
methods. For simplicity, we adopt the following notation scheme. Let Kq{v'^) be a 
univariate symmetric density function and define K{vi, ■ ■ ■ , Vd) = Kq^v^ + • • • + v"^) 
for any integer d and Kh{u) = h~'^K{u/h), where d is the dimension of u and ^ > 
is a bandwidth. Let Hi,,i{y) = Hb{Yi — y), where H{.) and b are defined above. For 
any {x,y), the principle of the local linear smoother suggests minimizing 

n 2 

1 J2 {Hb,i{y) - « - b'iXi - x)] KhiXi,) (2.3) 



n 

1=1 



with respect to a and b to estimate mi,{x,y) and dmh{x,y)/dx respectively, where 
= Xi — x. See Fan and Gijbels (1996) for more details. For each pair of (Xj, Y^), 
we consider the following minimization problem 

n 2 

(«jfe, bjk) = arg min V \Hbi{Yk) - ajk - bjkXij Wij, (2.4) 

where Xij = Xj — Xj and Wij = Kfi{Xij). We consider an average of their outer 
products 

n n 

k=i j=i 

where pj^ is a trimming function introduced for technical purpose to handle the 
notorious boundary points. In this paper, we adopt the following trimming scheme. 
For any given point {x,y), we use all observations to estimate its function value and 
its gradient as in (2.3). We then consider the estimates in a compact region of {x, y). 
Moreover, for those points with too few observations around, their estimates might 
be unreliable. They should not be used in the estimation of the CS directions and 
should be trimmed off. Let p{-) be any bounded function with bounded second order 
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derivatives on M such that p{v) > ii v > loq; p{v) = ii v < luq for some smah 
ujo > 0. We take pjk = p{f{Xj))p{fY{Yk)), where f{x) and are estimators of 

the density functions of X and Y respectively. The CS directions can be estimated 
by the first q eigenvectors of S. 

To allow the estimation to be adaptive to the structure of the dependency of Y 
on X, we may follow the idea of Hristache et al (2001) and replace Wij in (2.4) by 

Wij = Kh{ty^Xij), 

where S^/^ is a symmetric matrix with (S^/^)^ = S. Repeat the above procedure 
until convergence. We call this procedure the method of outer product of gradient 
based on the conditional density functions (dOPG). To implement the estimation 
procedure, we suggest the following dOPG algorithm. 

Step 0: Set S(o) = Ip and t = 0. 

Step 1: With Wij = K^iJ^^^-^ ^ij): calculate the solution to (2.4) 



^jk ' i=\ 



{E-.<-«)U)a,) y 



where ht and ht are bandwidths (details are given in (2.6) and (2.7) below). 
Step 2: Define = p(/W(X,))p(/W(n)) with 



where X^i^\k = 1, - ■ ■ ,p, are the eigenvalues of T,^^^^ and p = J Ko{ J2 "^l) 
^^(t)^. dvk- Calculate the average of outer products 



j,k=l 
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Step 3: Set t := t + 1. Repeat Steps 1 and 2 until convergence. Denote the 
final value of by S(oo)- Suppose the eigenvalue decomposition of '^[oo) is 
rdiag{\i, ■ ■ • , Ap)r^, where Ai > • • • > Ap. Then the estimated directions are 
the first q columns of F, denoted by BdOPG- 

In the dOPG algorithm, f^Xy) and p\x), t > 0, are the estimators of the 
density functions of Y and B^X respectively. A justification is given in the proof 
of Theorem 3.1 in Section 6.2. In calculations, the usual stopping criterion can be 
used. For example, if the largest singular value of E^^) — £(t+i) is smaller than 10~^ 
then we stop the iteration and take '^(t+i) ^ the final estimator. The eigenvalues 
of S(oo-) can be used to determine the dimension of the CS. However, we will not go 
into the details on this issue in this paper. In practice, we may need to standardize 
Xi = {Xii, ■ ■ ■ , XipY by setting Xj := S~^^'^{Xi — X) and standardize Yi by setting 
Yi := (Yi-Y)/^, where X = n"! E7=i X^ and = n'^ T.7=liX^-XKX^-X)' , 
Y = n-^ Yli=i Yi and = n"^ E"=i(^i - ^)^- Then the estimated CS directions 
are the first q columns of TS~^^^. 

2.2 MAVE based on conditional density function 

Note that if (1.1) holds, then the gradients dmb{x, y)/dx at all (x, y) are in a common 
g-dimensional subspace as shown in equation (2.2). To use this observation, we can 
replace h in (2.3), which is an estimate of the gradient, by Bd{x,y) and have the 
following local linear approximation 

n 

n-' Y,{Hb,i{y) - « - d^B^i^i - x)}^Kh{Xi^), 

i=l 

where d = d(x,y) is introduced to take the role of sjgi,{BQ x,y) in (2.2). Note that 
the above weighted mean of squares is the local approximation errors of Hh^i{y) by 
a hyperplane with the normal vectors in a common space spanned by B. Since 
B is common for all x and y, it should be estimated with aims to minimize the 
approximation errors for all possible Xj and Y^. As a consequence, we propose to 
estimate Bq by minimizing 

n n n 

E E Pi'' Ei^M(^fe) - - d]kB^Xij}^Wij (2.5) 

k=l j=l i=l 
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with respect to ajk,djk = {djki, ■ ■ • jdjkqy ,j,k = 1, ...,n and B : B B = Iq, where 
Pjk is defined above. This estimation procedure is similar to the minimum average 
(conditional) variance estimation method (Xia, Tong, Li and Zhu, 2002). Because 
the method is based on the conditional density functions, we call it the minimum 

average (conditional) variance estimation based on the conditional density functions 
(dMAVE). 

The minimization problem in (2.5) can be solved by fixing {ajk,djk),j,k = 
l,...,n, and B alternatively. As a consequence, we need to solve two quadratic 
programming problems which have simple analytic solutions. For any matrix B = 
{Pi, - ■ ■ , Pd), we define operators £{.) and M{.) respectively as 

^(B) = (/?!, •••,/3j)T and Mi£{B)) = B. 

We propose the following dMAVE algorithm to implement the estimation. 

Step 0: Let be an initial estimator of the CS directions. Set t = 1. 

Step 1: Let B = calculate the solutions of {ajk,djk),j,k = l,...,n, to the 

minimization problem in (2.5) 




where ht and bt are two bandwidths (details are discussed below). 

Step 2: Let = ^(/b^,, (X,))p(/(*)(n)) with ff{y) = n-' Y^^, H,,4y) and 
fB(t){x) = Y,i=i Kht{Bjf-^Xix)- Fixing ajk = afl and djt = dfl, calculate 
the solution of B or e{B) to (2.5) 

k,j,i=l 

n 

X E pflKH,iBl)X,j)xg{H,^4Yk)-afl}, 

k,j,i=l 

where xg = d^^Xij. 
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Step 3: Calculate A(t+i) = {X(b(*+i))}^X(b(*+i)) and = M(b(*+^)A~^f). 

Set t -.= 1 + 1 and go to Step 1. 

Step 4: Repeat steps 1-3 until convergence. Let -B^t^,) be the final value of B(fy 
Then our estimators of the directions are the columns in B(^^y denoted by 

BdMAVE- 

The dMAVE algorithm needs a consistent initial estimator in Step to guarantee 
its theoretical justification. In the following, we use the first iteration estimator of 
dOPG, the first q eigenvector of as the initial value. Actually, any initial 

estimator that satisfies (6.6) can be used and Theorem 3.2 will hold. Similar to 
dOPG, the standardization procedure can be carried out for dMAVE in practice. 
The stopping criterion for dOPG can also be used here. 

Note that the estimation in the procedure is related with nonparametric esti- 
mations of conditional density functions. Several bandwidth selection methods are 
available for the estimation. Sec, e.g. Silverman (1986), Scott (1992) and Fan ct al 
(1996). Our theoretical verification of the convergence for the algorithms requires 
some constraints on the bandwidths although we believe these constraints can be 
removed with more complicated technical proofs. To ensure the requirements on 
bandwidths can be satisfied, after standardizing the variables we use the following 
bandwidths in our calculations. In the first iteration, we use slightly larger band- 
widths than the optimal ones in terms of MISE as 

1 1 
^0 = cqu , bo = Con po+s , (2.6) 

where po = max(p, 3). Then we reduce the bandwidths in each iteration as 

i_ i_ _i 

/ij+i = max{rn/it, Con 9+"}, = max{rn6t, con 9+3, con s} (2.7) 

for t > 0, where r„ = n^^/^'^^°~^^^\ cq = 2.34 as suggested by Silverman (1986) if 
the Epanechnikov kernel is used. Here, the bandwidth b is selected smaller than h 
based on simulation comparisons. 

Fan and Yao (2003, p. 337) proposed a method, called the profile least-squares 
estimation, for the single-index model and its variants by solving a similar mini- 
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mization problem as in (2.5). The method is also possible to be used here for the 
estimation of Bq in (2.1). 

3 Asymptotic results 

To exclude the trivial cases, we assume that p > 1 and q > I. Let fo{y\vi, ■ ■ ■ ,Vq), 
/o(fi, • • • , Vq) and /y(y) be the (conditional) density functions of Y\BJX, BJX 
and y respectively. Let /9o(x,y) = p(/o(-B([x))p(/y(y)), \/fo{y\vi,---,Vq) = 
idfo{y\vi, - ■ ■ ,Vq)/dvi, - ■ ■ ,dfoiy\vi, - ■ ■ ,Vq)/dvqy , hb{u) = E{X\B^X = u) and 
wb{u) = E{XX~^\B^X = u}. For any matrix A, let \A\ denote its largest singular 
value, which is same as the Eculidean norm if ^ is a vector. Let Bq : p x {p — q) 
be such that {Bq, Bq)^ {Bq, Bq) = Ip. We need the following conditions for (1.1) to 
prove our theoretical results. 

(CI) [Design of X] The density function f{x) of X has bounded second order deriva- 
tives on MP; E\X\^ < oo for some r > 8; functions hb{u) and wb{u) have 
bounded derivatives with respect to u and B for B in a small neighbor of Bq: 
\B - Bq\ <S for some S > 0. 

(C2) [Conditional density function] The conditional density functions fvixivl^) 
and fY\B^x (yW) have bounded fourth order derivatives with respect to x, 
u and B for S in a small neighbor of Bq; the conditional density function of 
/bTxyibTxC^' 1 1 V fo{y\'^)\dy are bounded for all u,y and v. 

(C3) [Efficient dimension] Matrix Mq = J po{x,y) y HyW^x) \f fQ{y\Blx)f{x) 
/y {y)dxdy has full rank q. 

(C4) [Kernel functions] Kq{v'^) and H{v) are two symmetric univariate density func- 
tions with bounded second order derivatives and compact supports. 

(C5) [Bandwidths for consistency] Bandwidths Hq = cin~^^ and 6o = C2n~^'' where 
< rh,n < l/(Po + 6), pq = max{p,3}. For t > 1, ht = max{r„/tt_i, ^} 
and bt = max{r„6t_i,S} where r„ = n"'"'"''^, h = c^nT'^h,^ = C4n~'^b with 
< r^, < l/{q + 3), and ci, C2, C3, C4 are constants. 

In (CI), the finite moment requirement for \X\ can be removed if we adopt 
the trimming scheme of Hardle et al (1993). However, as noticed in Delecroix et al 
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(2004), this scheme caused some technical problems in the proofs. Based on assump- 
tions (C2) and (C4), the smoothness of gb{u,y) is implied. Lower order of smooth- 
ness is sufficient if we are only interested in the estimation consistency. The second 
order differentiable requirement in (C4) can ensure the Fourier transformations of 

the kernel functions being absohitely integrable; see Chung (p. 166, 1968). The pop- 
ular kernel functions such as Epanechnikov kernel and quadratic kernel are included 
in (C4). The Gaussian kernel can be used with some modifications to the proofs. 
Condition (C3) indicates that the dimension q cannot be further reduced. For ease 
of exposition, we further assume that /xoi? = / H{v)dv = 1,ij,2H = J v^H{v)dv = 1, 
HQq = J K{vi, ■■ , Vq)dvi ■■■dvq = 1 and fi2q = J K{vi, ■ ■■ , Vq)vldvi ■■■dvq = 1; 

"1^/2 1/2 1 

otherwise, we take H{v) := H{v / T2jj ) / and K{vi ,Vq) = /Xq^ K{vi / -y/JJoq, ■ ■ , 
Vq/-s/Jl2q) I \/Wiq- The bandwidths satisfying (C5) can be found easily. For exam- 
ple, the bandwidths given in (2.6) and (2.7) satisfy the requirements. Actually, a 
wider range of bandwidths can be used; see the proofs. Let vb{x) = /xb(B^x) — x, 

wb{x) = wb{B^x) — fiB{B^ x) fij^{B^ x) and fo{x) = fQ{B^x). For any square ma- 
trix A, A^^ and A'^ denote the inverse (if it exists) and the Moore-Penrose inverse 
matrices respectively. 

Theorem 3.1 Suppose conditions (C1)-(C5) hold. Then we have 

\BdOPGBjoPG - BoB^l = Oih"" + + 6q^d^ + d^/d^ + n-V^) 

in probability as n ^ oo, where (5^^) = {nh% / log n)~^/^ and 5n = (log n/n)^/^. If 
^'^ + ^qhb + ^qKb^'^ + <^n/^^ = o(n~^/^) Can be satisfied, then 

V^{e{BaoPGBjoPGBo) - iiBo)} ^ N{0, Wo), 

where 

Wo = Var[poiX,Y)M^\s7foiY\B],X)UiY)-E{^fo(Y\B],X)UiY)\X}) 

The first part of Theorem 3.1 indicates that BaopG is a consistent estimator of 
an orthogonal basis, BqQ with Q = Bq BdoPG, in CS and \BdOPG — BqQ\ = 0{h'^ + 
S'^f^+6qM,t)'^+Sn/t>^+n^^^^) in probability. See Bai et al (1991) and Xia, Tong, Li and 
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Zhu (2002) for alternative presentations of the asymptotic results. If the bandwidths 
in (2.7) are used, then the consistency rate is 0(n~^/'^5+^)+^/(*+^) logn + n~^/^) in 
probability. Faster consistency rate can be obtained by adjusting the bandwidths. 
The convergence of the corresponding algorithm is also implied in the proof in section 
6. If < 3, then the condition for the normality can be satisfied by taking 

If we use higher order polynomial smoothing, it is possible to show that the root-n 
consistency can be achieved for any dimension q. See, e.g. Hardle and Stoker (1989) 
and Samarov (1993), where the higher order kernel, a counterpart of the higher 
order polynomial smoother, was used. However, using higher order polynomial 
smoothers increases the difficulty of calculations while the improvement of finite 
sample performance is not substantial. 

Theorem 3.2 If conditions (C1)-(C5) holds, then 

\BdMAVEBjMAVE - BqEJ \ = 0{fl' + + dg^h'' + d^/h^ + n'^l^} 

in probability as n — oo. If + + Sgf^d^ + (5^/5^ = o{n~^/^) can be satisfied, 
then 

V^{iiBdMAVEBjMAVEBo) - ^(^o)} ^ N{0,D+EoD+), 

where Dq = J po{x,y) V /o(y|-B([x) sffoiy\Blx) (g) {ug^{x)ul^{x)] fo{x)fY{y)dxdy 
and 

Eo = Var\p^{X,Y){s/fo{Y\BlX)U{Y) - E{s/ U{Y\BIX)U{Y)\X}) ® u^^{X)]. 

The proof of Theorem 3.2 is given in section 6. The convergence of the dMAVE 
algorithm is implied in the proof. Similar remarks on dOPG are applicable to 
dMAVE. Moreover, B^mave converges to BqQ, where Q is determined by the initial 
consistent estimator of the directions. For example, Q = B^^^Bq if is used as the 
initial estimator. Similarly, the root-n consistency holds for q < 3. It is possible that 
the root-n consistency holds for g > 3 if higher order local polynomial smoothing 
method is used. In spit of the equivalence in terms of consistency rate for both 
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dOPG and dMAVE, our simulations suggest that dMAVE has better performance 
than dOPG in finite samples. Theoretical comparison of efficiencies between the 
two methods is not clear. In a very special case when q = I and the CS is in the 
regression mean, Xia (2006a) proved that dMAVE is more efficient than dOPG. 

Wc here give some discussions about the requirements on the distributions of 
X and Y. If Y is discrete, we can consider the conditional cumulative distribution 
functions and have FY\x{y\x) = Fy^s^ xivl-^o ^) when Y A. X\Bq X holds. Similar 
to (2.1), we can consider a regression model 

liY <y) = GiB^X,y) + e{y\X), 

where G{Bjx,y) = E{I{Y < y)\X = x} = E{I{Y < y)\BjX = BJx} and 
e{y\X) = I{Y < y) — G{BQX,y). Similar theoretical consistency results are possible 
to be obtained following the same techniques developed here. If some covariates in 
X are discrete, our algorithms in searching for a consistent initial estimator will fail. 

However, if a consistent initial estimator can be found by for example the methods 
in Horowitz and Hardle (1996) and Hristachc, Juditski, Polzehl and Spokoiny (2001) 
and that B^ X has a continuous density function for all B in a neighbor around Bq, 
then our theoretical results in the above theorems still hold. 

4 Simulations 

We now demonstrate the performance of the proposed estimation methods by simu- 
lations. We will compare them with some existing methods including SIR (Li, 1991), 
SAVE (Cook and Weisberg, 1991), PHD (Li, 1992) and rMAVE (Xia, Tong, Li and 
Zhu, 2002). The computer codes used here can be obtained from www.jstatsoft.org/ 
vOl/iOl/ for SIR, SAVE and PhD methods (Courtesy of Professor S. Weisberg) and 
www.stat.nus. edu/~ycxia/ for rMAVE, dOPG and dMAVE. In the following calcu- 
lations, we use the quadratic kernel H{v) = Ko{v^) = (15/16)(1 — u^)^/('u^ < 1) and 
uq = 0.01. The bandwidths in (2.6) and (2.7) are used. For the inverse regression 
methods, the number of slices is chosen between 5 to 30 that is most close to n/ (2p). 
We define an overall estimation error of estimator B : W B = Iq by the maximum 
singular value of BqB^ — BB^; see Li et al (2004). 
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Example 4.1 Consider model 



Y = sign(2XT/3i + e{) log(|2XT/52 + 4 + ssl), 



(4.1) 



where sign(-) is the sign function. Coordinates X ~ N{0,lp), unobservable noises 
£i ~ A^(0, 1) and £2 ~ -^(0,1) are independent. For (3i, the first 4 elements are 
all 0.5 and the others are zero. For P2, the first 4 elements are 0.5,-0.5,0.5,-0.5 
respectively and all the others are zero. A similar model was investigated by Chen 
and Li (1998). In order to show the effect on the estimation performances of the 
number of covariates, we vary p in the simulation. With different sample sizes, 200 
replications are drawn from the model. The calculation results are listed in Table 1. 
To get an intuition about the quantity of estimation errors, Figure 1 shows a typical 
sample of size n = 200 and its estimate with estimation error 0.21. The structure 
can be estimated quite well in the sample. 



>-0 




>-0 




=-0 



T 

P2X 



Figure 1: A typical data of size 200 from Example 4-1 with p = 10 to show the quantity of 
estimation error and its graphic performance. The left two panels are plots of y against the 
true CS directions; the right two panels y against the estimated directions using dMAVE. 
The estimated directions are respectively f3i — (0.42, 0.64^, 0.^4, 0.45, -0.01, -0.07, 0.02, 
-0.00, -0.08, 0.01)^ and P2 = (-0.54, 0.43, -0.57, 0.43, 0.01, -O.O4, -0.01, 0.07, -0.05, 
0.07)^ with estimation error 0.21. 



Table 1: Mean (and standard deviation) of estimation errors for Example 4.1 



n p 


dOPG dMAVE rMAVE SIR SAVE PHD 


5 

100 10 
20 


0.25(0.09) 0.22(0.08) 0.43(0.19) 0.29(0.09) 0.87(0.19) 0.72(0.22) 
0.55(0.19) 0.35(0.07) 0.64(0.19) 0.46(0.10) 0.94(0.06) 0.90(0.13) 
0.81(0.13) 0.54(0.10) 0.88(0.12) 0.64(0.11) 0.96(0.06) 0.93(0.07) 


5 

200 10 
20 


0.17(0.05) 0.14(0.04) 0.27(0.13) 0.19(0.05) 0.55(0.26) 0.47(0.15) 
0.32(0.09) 0.24(0.06) 0.46(0.17) 0.30(0.06) 0.96(0.08) 0.73(0.16) 
0.62(0.15) 0.36(0.06) 0.66(0.16) 0.43(0.06) 0.93(0.04) 0.94(0.08) 


5 

300 10 
20 


0.13(0.04) 0.13(0.04) 0.19(0.07) 0.16(0.05) 0.32(0.18) 0.37(0.12) 
0.24(0.06) 0.18(0.04) 0.36(0.16) 0.24(0.05) 0.85(0.17) 0.59(0.15) 
0.48(0.13) 0.28(0.05) 0.55(0.16) 0.35(0.05) 0.92(0.03) 0.84(0.12) 


5 

400 10 
20 


0.11(0.04) 0.11(0.04) 0.21(0.12) 0.14(0.04) 0.22(0.11) 0.31(0.10) 
0.21(0.04) 0.16(0.04) 0.31(0.11) 0.21(0.05) 0.66(0.22) 0.51(0.13) 
0.31(0.06) 0.25(0.04) 0.49(0.15) 0.29(0.04) 0.98(0.04) 0.76(0.14) 
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In model (4.1), the CS directions are hidden in a comphcated structure and are 
not easy to be detected directly by the conditional regression mean function. When 
sample size is large (> 200) and p is not high (= 5), all the methods have accurate 
estimates. As p increases, rMAVE performs not so well because the second direction 
is not captured in the regression mean function; SAVE and PHD also fail to give 
accurate estimates. SIR performs much better in all the situations than SAVE and 
PHD. dOPG has about the same performance as SIR. dMAVE is the best in all 
situations among all the methods. 

Example 4.2 Now, consider the CS in conditional mean as well as the conditional 
variance as in the following model 



(4.2) 



where X = (xi, • • • , xio)^ with xi, • • • , xio ~ Uniform{—\/^, \/3) and e ~ iV(0, 1) 
are independent, /?i = (1, 2, 0, 0, 0, 0, 0, 0, 0, 2)^ /3 and P2 = (0, 0, 3, 4, 0, 0, 0, 0, 0, 0)^ 
/5. For model (4.2), one CS direction is contained in the regression mean and the 
other in the conditional variance. One typical data with size 200 is shown in Figure 
2. Table 2 lists the calculation results of 200 replications. 



>>0 




Figure 2: A typical data with n = 200 from Example ^.2 and its dMAVE estimation. The 
left two panels are plots of y against the true CS directions respectively; the right two panels 
y against the estimated directions respectively with estimation error 0.31. 



Because rMAVE cannot detect the CS directions hidden in the conditional vari- 
ance directly, it has very poor overall estimation performance as listed in Table 2. If 
d = 1, i.e. the regression mean function is monotonic, SIR works reasonably well; if 
d = 2, the regression mean function is symmetric and SIR fails to find the direction 
hidden in the regression mean. As a consequence, its performance is very poor. 
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The performances of SAVE and PHD are also far from satisfactory though they are 
apphcable to the model theoretically. The proposed dOPG and dMAVE perform 
very well and are better than the existing methods listed in the table. 



Table 2: Mean (and standard deviation) of estimation errors for Example 4.2 



d 


n 


dOPG 


dMAVE 


i-MAVE 


SIR 


SAVE 


PHD 


1 


100 
200 
400 


0.57(0.15) 
0.36(0.08) 
0.24(0.05) 


0.44(0.12) 
0.28(0.06) 
0.18(0.04) 


0.85(0.13) 
0.76(0.16) 
0.68(0.15) 


0.63(0.15) 
0.42(0.09) 
0.29(0.06) 


0.93(0.08) 
0.91(0.12) 
0.64(0.16) 


0.99(0.08) 
0.98(0.07) 
0.97(0.07) 


2 


100 
200 
400 


0.63(0.19) 
0.33(0.10) 
0.22(0.05) 


0.46(0.16) 
0.28(0.06) 
0.19(0.04) 


0.85(0.16) 
0.70(0.18) 
0.66(0.19) 


0.96(0.09) 
0.95(0.07) 
0.95(0.09) 


0.90(0.06) 
0.87(0.11) 
0.85(0.12) 


0.91(0.11) 
0.88(0.11) 
0.89(0.11) 



Example 4.3 In this example, we demonstrate the consistency rates of the esti- 
mation methods by checking how the estimation errors change with sample size n. 
Consider model 



Y 



Xl 



+ X3{x-i + Xi + l) + O.le, 



(4.3) 



0.5 + (1.5 + X2)2 

where e ~ A^(0, 1) and X ~ A^(0, /lo) are independent. Model (4.3) is a combination 
of the two examples in Li (1991). For this model, all the theoretical requirements 
for the methods are fulfilled. Therefore, it is fair to use the model to check their 
consistency rates. 




400 600 800 
sample size n 



1000 




400 600 800 
sample size n 



1000 



Figure 3: The calculation results for Example 4-3 using different estimation methods. The 
lines are the mean of estimation errors with different sample size and 200 replications. The 
left panel is the plot of the errors against sample size; the right panel is the errors multiplied 
by root-n against sample size. 



In the left panel in Figure 3, the proposed methods have much smaller estima- 
tion errors than the inverse regression estimations. Because all the directions are 
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hidden in the regression mean function, it is not surprising that rMAVE has the 
best performance. Multiphed by root-n, the errors should keep in a constant level 
if the theoretical root-n consistency is applicable to the range of sample size. The 
right panel suggests that the estimation errors of SIR and SAVE do not start to 
show a root-n decreasing rate for the sample size up to 1000, while PHD, rMAVE, 
dOPG and dMAVE demonstrate a clear root-n consistency rate. 

Example 4.4 In our last example, we consider a model with a very complicated 
structure. Suppose {Xi,Yi),i = 1,2, •••,n, are drawn independently from model 
Y = l3j X/2 + e{l - \l3jX\'^y/^, where {X,e) satisfies {X ~ iV(0,/io),e ~ A^(0, 1) : 
\f3jX\ < 1, \PJX\ < 1,0.5 < WjXfil - £2) + £2 < 1}^ and ^2 are defined in 
Example 4.1. The calculation results based on 200 replications are listed in Table 3. 
Because of the complicated structure as shown in Figure 4, the CS directions are not 
easy to be estimated and observed directly. However, with moderate sample size, 
the proposed methods can still estimate the directions accurately. It is interesting 
to see that SAVE also works in this example. 



Table 3: Mean (and standard deviation) of estimation errors for Example 4.4 



n 


dOPG 


dMAVE 


rMAVE 


SIR 


SAVE 


PHD 


200 
400 
600 
800 
1000 


0.5909(0.29) 
0.2117(0.19) 
0.1148(0.04) 
0.0876(0.03) 
0.0782(0.02) 


0.5089(0.30) 
0.1498(0.10) 
0.1059(0.03) 
0.0862(0.02) 
0.0779(0.02) 


0.9411(0.07) 
0.9573(0.05) 
0.9725(0.03) 
0.9744(0.03) 
0.9671(0.04) 


0.8770(0.12) 
0.8783(0.13) 
0.8758(0.13) 
0.8737(0.14) 
0.8819(0.13) 


0.9242(0.19) 
0.7677(0.18) 
0.5357(0.21) 
0.3657(0.13) 
0.2604(0.06) 


0.9833(0.05) 
0.9789(0.03) 
0.9799(0.03) 
0.9757(0.04) 
0.9789(0.04) 



Based on the simulations, we have the following observations. (1) The existing 
methods (rMAVE, PHD, SIR and SAVE) fail in one way or another to estimate the 
CS directions efficiently, while dOPG and dMAVE are efficient for all the examples. 
(2) dOPG and dMAVE demonstrate very good finite sample performance, even a 
root-n rate of estimation efficiency, while some of the existing methods do not show 
a clear root-n rate in the range of sample sizes investigated. (3) dOPG and dMAVE 
are less sensitive to the number of covariates than PHD, SAVE and SIR. Simulations 
not reported here also suggest that the asymmetric design of X has less effect on 
dOPG and dMAVE than that on the inverse regression estimations. (4) If the CS 
directions are all hidden in the regression mean function, rMAVE is the best and 
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should be used. Otherwise, dOPG and dMAVE are recommended. 
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Figure 4: yl typical data from Example 4-4 with n — 200 and its dMAVE estimation. The 
upper three panels are plots of y against the true CS directions and y — x^/3i/2 against 
the second direction respectively; the lower three panels are plots of y against the estimated 
CS directions (with estimation error 0.32) and y — a:^/3i/2 against the second estimated 
direction respectively. 



5 Real data analysis 

Example 5.1 (Cars data) This data was used by the American Statistical Asso- 
ciation in its second (1983) exposition of statistical graphics technology. The data 
set is available at http://lib.stat.cmu.edu/datasets/cars.data. There are 406 observations 
on 8 variables: miles per gallon (y), number of cylinders {Xi), engine displacement 
(^2), horsepower (X3), vehicle weight (X4), time to accelerate from to 60 mph 
(X5), model year {X^), and origin of a car (1. American, 2. European, 3. Japanese). 

Now we investigate the relation between response variable Y and covariates X = 
{Xi, • • • , Xg)^, where Xi, • • • , Xg are defined above, Xj = 1 if a car is from America 
and otherwise; X^ = 1 if it is from Europe and otherwise. Thus, {Xj,X^) = 
(1,0), (0,1) and (0,0) correspond to American cars, European cars and Japanese 
cars respectively. For ease of explanation, all covariates are standardized separately. 
When applying dOPG to the data, the first 4 largest eigenvalues are 21.1573, 1.6077, 
0.2791 and 0.2447 respectively. Thus, we consider CS with dimension 2. Based on 
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dMAVE, the two directions (coefficients of X) are estimated as (3i = (-0.33, -0.45, 
-0.45, -0.53, 0.14, 0.42, 0.00, -0.02)^ and (32 = (0.00, 0.15, -0.10, -0.23, -0.12, -0.17, 
-0.88, 0.29)^ respectively. The plots of Y against 'pj X and {5^ X are shown in 
Figure 5. 
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Figure 5: The estimation results for Example 5.1 using dMAVE. The two panels are plots 
of Y against the two estimated CS directions respectively. The origins of cars are denoted 
by "■ " for American cars, "x " for European cars, and "o" for Japanese cars. 

Based on the estimated CS directions and Figure 5, we have the following in- 
sights to the data. The first direction highlights the common structure for cars of 
all origins: miles per gallon (Y) decreases with number of cylinders (Xi), engine 
displacement {X2), horsepower {X3) and vehicle weight {X4), and increases with 
the time to accelerate (X^) and model year {Xq). The second direction indicates 
the difference between American cars and European or Japanese cars. 



Example 5.2 (Ground level Ozone) Air pollution has serious impact on the 
health of plants and animals (including humans); see the report of the World Health 
Organization (WHO) (2003). Substances not naturally found in the air or at greater 
concentrations than usual are referred to as "pollutants". The main pollutants 
include nitrogen dioxide (NO2), carbon dioxide (CO), sulphur dioxide (SO2), res- 
pirable particulates, ground-level ozone (O3) and others. Pollutants can be classified 
as either primary pollutants or secondary pollutants. Primary pollutants are sub- 
stances directly produced by a process, such as ash from a volcanic eruption or the 
carbon monoxide gas from a motor vehicle exhaust. Secondary pollutants are prod- 
ucts of reactions among primary pollutants and other gases. They are not directly 
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emitted and thus cannot be controlled directly. The main secondary pollutant is 
ozone. 

Next, we investigate the statistical relation between the level of ground-level 
ozone with the levels of primary pollutants and weather conditions by applying our 
method to the pollution data observed in Hong Kong (1994-1997, http://www.hku.hk 
/statistics/paper/) and Chicago (1995-2000, http://www.ihapss.jhsph.edu/data/data.htm). 
This investigation is of interest in understanding how the secondary pollutant ozone 
is generated from the primary pollutants and weather conditions. Let Y, N,S,P,T 
and H be the weekly average levels of ozone, nitrogen dioxide (NO2), sulphur diox- 
ide (SO2), respirable particulates, temperature and humidity respectively. To in- 
clude the interaction between primary pollutants and weather conditions into the 
model directly, we further consider their cross-products resulting in 15 covariates 
all together, denoted by X. For ease of explanation, all covariates are standard- 
ized separately. For all possible working dimensions, only the first two dimen- 
sions show clear relations with Y. We further calculate the eigenvalues in dOPG. 
The largest four eigenvalues are 10.78,2.93,2.11,1.70 respectively for Chicago, and 
6.89, 1.24, 0.69, 0.52 for Hong Kong. Now we consider the dimension reduction with 
efficient dimension 2 although the estimation of the mimber of dimension needs 
further investigation. The estimates for the first two directions are given in Table 
4. 



Table 4: The estimated CS directions in Example 5.2 



City 


Direction 


N 


S 


P 


T 


H 


N*S 


N*P 


N*T 




/3i 


0.10 


-0.13 


-0.06 


-0.00 


-0.00 


0.06 


0.29 


0.19 


Chicago 


02 


-0.10 


-0.11 


0.39 


-0.25 


-0.07 


0.12 


-0.15 


0.09 




01 


0.32 


-0.15 


0.23 


0.10 


-0.41 


-0.07 


0.20 


0.42 


Hong Kong 


P2 


-0.04 


-0.08 


-0.12 


0.18 


0.19 


-0.21 


0.35 


0.17 


city 


Direction 


N*H 


S*P 


S*T 


S*H 


P*T 


P*H 


T*H 






01 


0.04 


-0.18 


0.27 


-0.01 


-0.06 


0.36 


0.77 




Chicago 


02 


-0.51 


0.46 


-0.20 


-0.21 


-0.15 


-0.16 


0.32 






01 


0.10 


0.01 


-0.05 


-0.31 


0.53 


0.12 


-0.14 




Hong Kong 


02 


-0.52 


-0.26 


-0.18 


0.42 


0.22 


-0.29 


-0.19 





The plots of Y against the two estimated directions are shown in Figure 6. 
The plots show strong similar patterns in the two separated cities. If we check 
the estimated coefficients (directions), NO2 and particulates (or their interaction) 
are the most important pollutants that affect the level of ozone. Temperature and 
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humidity and their interaction are the other important factors. The interactions of 
weather conditions with NO2 and particulates also contribute to the variation of 
ozone levels. These statistical conclusions give support to the chemical claim that 
ozone is formed by chemical reactions between reactive organic gases and oxides of 
nitrogen in the presence of sunlight; see the report of WHO (2003). 
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Figure 6: The estimation results for Example 5.2 using dMAVE. The upper two panels are 
the levels of ozone against the first two estimated CS directions in Hong Kong, the lower 
two panels are those in Chicago. 



6 Proofs 

6.1 Basic ideas of the proofs 

The basic idea to prove the theorems is based on the convergence of the algorithms 
and that the true dimension reduction space is the attractor of the algorithms. We 
here give a more detailed outline for the proof of Theorem 3.2. Suppose the estimate 
of -Bo in an iteration of the dMAVE algorithm is B(^^-^ . It follows from Step 2 that 

n _, 

k,j,i=l 
n 

X E P?kKhABl)X,,)xl!;i{H,4Y,) - afl - ^(i?of xgi}, (6.1) 
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where X^'^, is defined in the algorithm. By the decomposition in Step 3, we obtain 
estimate -6(^+1) in the next iteration. If the initial value is a consistent estimator 
of -Bo, by Lemmas 6.3, 6.4 and 6.5 below, we will obtain a recurring relation for the 
iterations as 

with \Qt\ < 1 and \Tn,t\ = o(l) almost surely when t > 1. Therefore, the dimension 
reduction space is an attractor in the algorithm. This recurring relation is then used 
to prove the convergence of the algorithm and the consistency of the final estimator. 
To ensure the convergence of the algorithm, we need to consider the consistency 
with probability 1. 

The details of the proofs are organized as follows. In section 6.2, we first list a 
series of lemmas. Lemmas 6.1-6.5. Based on these Lemmas the theorems are then 
proved. The proofs of Lemmas 6.1-6.5 are algebraic albeit complex calculations of 
Lemmas 6.6 and 6.7. They can be found in Xia (2006b) and are available upon 

request. Lemmas 6.6 and 6.7 are two basic results used in the proof dealing with 
the uniform consistency. Their proofs are given in section 6.3. 

6.2 Proofs of the theorems 

We first introduce a set of notations. Let e6,i(y) = Hi,(Yi — y) — E{Hf,(Yi — y)\Xi), 
Vy C M be a compact interior support of Y , i.e. for any v G , there exists ^ > 
such that '^'^^y:\y-v\<5 fyiy) > 0. Similarly, we can define a compact interior support 
for X. For B C {B : B^B = Iq}, define Sb = max{|S - Bq] : B e B}. For any 
index set Z and random matrix An{z), we say An{z) = 0{an\z E Z), ot An{z) = 
0{an) for simplicity, if sup^^2:\^n{z)\/an = 0{1) almost surely. As usual. An = 
Op{an) indicates that every term in An is 0(a„) in probability as n — ^ oo. Recall 
that Bo = (/3oi, /3o2, • • • , Poq) and B = {Pi, 02, • • • , Pq)- Let Hlf{x) = gb{Bjx, y) + 
vMBlx,y)B-^X,^, Hlf{x) = EU=iVlMB],x,yM XMX,,)/2 and 
H!'f{x) = Y.'U,r=i{vUr%{Bl^^y) {01X,,,){PIXMX,,)]/Q, where X,, = X,- 
X, \/gb{vi, ■ ■ ■ ,Vq,y) is defined in Section 2 and 

v'l,K9b{vu--- ,Vq,y) = ^^^^ gb{vi, - ■ ■ ,Vq,y) for l, k = 1,2, ■ ■ ■ ,q, 
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and \/\^r,i9b is defined naturally. By Taylor expansion of gb{BQXi,y) at B^x, it 
follows from model (2.1) that 

H,,,{y) = Hlf{x) + Hlf\x) + Hlf{x) + e,,,{y) + 0{\BlX,,t) (6.2) 

almost surely. Let 5„ih = (n/i™/ log n)~^/^, 5„ihb = (^^/i™b/ log n)^^/^ for any integer 
m, 4 = (n6/logn)-^/2, 5„ = (log n/n)^/^ ^j^^j ^^^^ = /j2 _^ ^4 _^ _^ ^^^^^^^ l^.^. j 

and /y be the density functions of X, X and Y respectively. Again, for simplicity, 
we write fsix), ^ib{x),wb{x) for fB{B^ x), ij,b{B^ x) and wb{B^x) respectively; see 
also the definitions in Section 3. Let c, cq, ci, • • • , be a sequences of positive constants, 
while c may have different values at different places. 

Lemma 6.1 [Kernel smoother in the first iteration] Let 

(cj - { t -y^.^) (;,:/,) ujr t ^^(^«) (;.:/,) 

Under assumptions (CI), (C2) and (C4), if /i ^ 0, 6 — > and nh^^'^b/ logn oo, 
then we have 

I 1 

axy=9b{Bj x,y) + -"^V^ngbiBo x,y)h^ + 0{h^ + 6phb\x eV^,y e V^), 

K=l 

n 

bxy=Bo\7 gb{Blx,y) + {l^2pnh^ fix)}'^"^ Kh{Xi^)Xix£b,i{y) 

i=l 

+0{h'^ + Sphb\x G G Py). 

Lemma 6.2 [Kernel smoother in dOPG] Define Vq = {D = i?diag(Ai, • • • , Xq)B^ + 
5diag(Ag+i, • • • , Xp)B^: {B, Bf{B, B) = Ip, a > min(Ai, ■ ■ ■ , Xq) > cq > 0, B e B 
and max(Ag+i, • • • , Xp)/h?' < e„}. Let 

.f(.,^„-i:A,(z,v.x..,(;jQj 

and 

(if) = {^^n (^)}"'E^'^(^'^'^^-)QJ^M(y)• 
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1 

a^y=9b{Bjx, y) + ^Yl "^IMBqx, y)h^ + 0{h? + 5qhb\x e V^,y e V^, D e Vg), 



Under assumptions (CI), (C2) and (C4), if n/i^+^b/log n oo, b 0,h 0, 
Ss/h — ^ and — ^ 0, then we have 

2 

bg = Bo{s7gbiBlx, y) + ^(/i^ + + e„)} + ^T^ol^^, y) 
where e^/j;, = /i^ + (/i^ + 6qh)6qhb + {h? + (J^/jbjen + (/i + Sqhb/h)SB and 

g n 
T=l i=l 

Lemma 6.3 [Kernel smoother in dMAVE] Let 



and 



Under assumptions (CI), (C2) and (C4), if nh%/\ogn — *■ oo, 6 — 0, /i — and 
Ss/h ^ 0, then 

2 

K=l 

+Vf„(x, y) + 0{h^ + + Ms + 4|x G P^, y G P^, 5 G S), 



1 

«fj/=56(-B^x, y) + \j^gb{Blx, y){Bo - Bfv^ {x) + y^^gbiB^x, y)h^ 



d^yh=S7gb{B])X, y)h + Mg{x, y)h^+ V^^ix, y) 

+0(^V V V + Mb + (5||x G , y e , B e B) , 

where 

Vf„(x, y) = {l + Mg{x, h)h}£^^, {x, y) + Mg{x, h)h£^^^{x, y), 

Vg{x, y) = Mg{x)hS^^,{x, y) + {l + Mg{x, h)h}£g^^{x, y), 

Mg{x),k = 1, 2, • • • , 5, are bounded continuous functions (details can be found in 
the proofs) and 

n 

£n,iix,y) = {nfB{x)}-'Y.KhiB''Xi,)eb,iiy), 

1=1 

n 

£n,2ix,y) = {nhfB{x)}-'Y,Kh{B^Xi,)B^Xi,eb,i{y). 

i=l 



25 



Lemma 6.4 [Denominator of dMAVE] Let pf^ = p{fB{Xj))p{fY{Yk)), where 

n n 

Mx) = KhiB'X,,), U{y) = n'^ ^ H,{Yi - y). 

i=l i=l 

Let Xf-f^ = df^®Xij where df^, = d^.y^. Suppose (C1)-(C4) hold and nhi+%/logn 
— oo, nb^/ log n ^ oo, b ^ 0,h ^ and Ss/h — 0. We have 

" -1 
{^"' E pfkKhiB'Xi^)Xf^,iXf^,)'] = (7,®S)Lf(7,®B>-2 + (7,®B)L2 

k,j,i=l 

+Ls{Ig B^) + + 0{{rghb + Sqhb)/h\B G B), 

where Li,L2 and L3 are constant matrices (details can be found in the proof) and 
Db = J pUb{x))p{U (y)) V 9b{Blx,y) ^'^gbiB^x, y) ® {vj^{x)vl{x)]f{x)f{y)dxdy. 

Lemma 6.5 [Numerator of dMAVE] Suppose conditions (C1)-(C4) hold. If 6 ^ 
0, /i — >■ 0, nh'^b/ logn — > 00, n5^/ logn — > 00 and Ss/h ^ 0, then 

n 

n''' J2 pfkKh{B'X,,)Xf^,{H,,{Yk) - af, - iiBoY Xf^,} = DB{m - i{B^)) 

k,j,i=l 

+^n{Bo) + Oih" + rghbSqhb + Slhb + ^l/b^ + i^qhb/h + h)dB\B e B}, 

where af,^ = a^^^^ , $„(Bo) = 0{5n + S^f^Jh) almost surely and $„(Bo) = Op(n-V2) 
with {Ig<^Bj)^n{Bo) = and ^/n^niBo) ^ A^(0, Sq), where So is given in Theorem 
3.2. 

Proof of Theorem 3.1 By Lemma 6.1, write 

n 

bxy = BoCn{x, y) + {l^2pnhlf{x)y'^ ^ Kh^ {Xix)Xi:c£b^^i{y) + BQO{hl + 6phobo), 

i=l 

where {Bq,Bq) is a p x p orthogonal matrix and Cn{x,y) = '\/gfj{Bj x,y) + ©(/ig + 
^ph^fio)- By Lemma 6.6, the second term on the right hand side above \sO{5ph^i)^/hQ)- 
It follows from step 2 in the dOPG algorithm that 

n 

^(1) = (-Bo, -Bo)C„(i?o, -Bo)^+ n"^^ {Sijk + Sjjf.) 

i,j,k=l 

+0{{hl + Sphobo)Sphobo/ho}, (6.3) 
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n 

Cn = n 

j,k=l 



where and p^-^^ are defined in the algorithm, Sijk = pfk {l^2phQf{Xj)} ^ 

Bo V 9b, {BlXj, Yk)Kh^ {Xij)Xlei,^^^,{Yk) and 

-2 (0) / '^n 

{Xj,Yk) \( Cn{Xj,Yu) V 

Z^^P^k \o{hl + 5,nobo)) \0{hl + 5,u,b,)) 
where K^n'^ = n-^Y.lk=ipfkCn{Xj,Yk)cl{X,,Yk). By Lemma 6.6, we have (y) = 

fAy) + f"iyK/2 + + \ \y e v^), = fix) + + <5p^, \x g p^). By 

the definition of /)(.), we have p^°^ = (y)) + O(rp/iofeja; eRP,y e M), 

where P6o(/.(y)) = (?/))+ P'(/.(y))/^(y)&i/2. Let 

x{p2phy{X,)}-^K^^iXij)Xleh^,i{Yk). 

By (C5) and Lemma 6.7, we have n-^^l^^^^^Sijk = 0{{Sn + (^^^^^ + 5^/5^)/^o}- 
Thus, 

n n 

E Sijk = n-' Sijk + 0{rpn,bMo^ho'} = oC&, (6.4) 

i,j,k=l i,j,k=l 

where 

aI^^ = (^n/^o + ^Ihob/ho + Sl/{blho) + h^ + rph^^b^Sph^^b^h^^ ■ 

By (C3) and the strong law of large numbers for U-statistics (cf. Hoeffding, 1961), 
= /p(/(a;))p(/y(y)) V 9bo{B^ x,y) v^5&o(^([ x,y)}fix) fAy)dxdy + o(l) al- 
most surely, which is of full rank asymptotically. Thus its eigenvalues are greater 
than a positive constant asymptotically. On the other hand, the eigenvalues of the 
lower right principal submatrix in C„ are of order A^^ Let A^^^^ > ... > Ap^^ be 
the eigenvalues of S(i) and fi^\ ■ ■ ■ , Pp''^ be the corresponding eigenvectors. By 
the interlacing theorem (cf. Ando, 1987), we have min{A^^\ ■ • • , Aq^^} > c and 
max{Aj^_Ji, • • • , Aj,^^} = 0(X^'^). By (6.3) and (6.4) we have 
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By Lemma 3.1 of Bai et al (1991), we have 

5(1)^5) -5oSo^ = 0((5«). (6.6) 

Let t = 1. Consider the {t + l)th iteration. Let £n\{x, y) = <?^o' (x, y) as defined in 
Lemma 6.2. By the conditions on bandwidths in (C5), wc have ei^) Ai^V/i? ^ 
and 5^V^i ~^ 0- -^y Lemma 6.2, similar to (6.3), we have from the algorithm 

n 

= (So, Bo)Cj^HBo, BoV + ^ {5« + (5«)^} + ^(e,,,;., (6-7) 

j,k=i 

where = p5|i?o{ V% {BlX„Yk) + ^(/i^ + <5,;,, + e^rt^)}{£^liX„Yk)y and 

r-W ^ ( 0{eghb) \ 

where A^*^ = ^^^^^^ pf^ y 56, (^J , n) V^56, (^J , n) + 0{hl + V* + e^*^ }• 
Note that i3j^f,W(^.^y^) = q, 4*J,(X,-,n) = 0(^*6*) and B-^ £^^l{X^,Yk) = 
OiSghtbtSs')- It follows that 

n 

„-2"^ /q(*) _l /'c(*)\T 



n 

{Bo, Bo) (So,5o)^n-3^{5j2 + (sj2)^}(5o,Bo) (So, So)" 



= (So,So) (,) '''^^)iBo,BoV + 0{dgh,,,d'i\ (6.8) 

whereC^l^ = n-^Y:lk=iPfk{'7gb,{BjXj,Y,) + 0{hl + SgH,+e^^^^^^^ 

Bo. Similar to pSJ, we have pf^^ = pft,+0{rqhtbt) where pJJ = p(/Bo(Xj)){p(/y (Ffe)) 

+p'(/y(n))/^(^fc)btV2}. By (C5) and Lemma 6.7, we have 



n 



i,fc=i 

=0((5„ + 5l^^^^ + 526-2 + Tgh^bAhtbt + 4^^<ih,b,). (6.9) 

By the strong law of large numbers for U-statistics, it follows A^*^ = Mo + o(l) 
almost surely, where Mq is defined in (C3). Let A^"*"^-* > ... > Ap"*"^^ be the eigen- 
values of ^{t+i) and B^^_^l^| the first q eigenvectors. By the same arguments as 
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for x!k\ it follows from (6.7), (6.8) and (6.9) that min{Af+^\ A^+^^j > c and 
max{Aj*^!]^\...,A{,*+^^} = 0{Ai*+^^}, where A^*"^^^ = eqUtht^qhtbt + ^lh,bt + Vt6t<^B ■ 
Considering en^^^ht-\-i Ai*'''^'*//it+i, there exists a constant ci, which does not 
depend on t, such that 

er^^,+i < cUx^l + Xt4^h + x^^g)}, (6.10) 

where Xo*n = {hi + hf6qhtbt+SqhtbAht)Sqhtbt/ht+l, Xi*n = {hf+6qhtbt)Sqhtbt/ihtht+l) 

and X2*l = Sqhtbt/ht+i- By (6.7) and (6.8), we write 

^(t+l) = -BoA^*^-Bo + BoC^2,n^O + -^o(C'i2,n-So)^ + 0{eqhtbt + <^q/Jt6t<^B''}) (S-H) 

where (5^2 n ^'^^ ^''^^ term on the right hand side of the first equation in (6.9). By 
the same arguments as for (6.6), we have -B(t+i)-Bj^^-) — BqB^ = 0{5qhtbt{^qhtbt + 
rqh^b,) + {hi + rqhtbt)e^n^ + {h + 5qhb/h)5^S +^n + ^l/hl + hi). That is 

<^r^<C2{xa + X?W)/., + xa<5g)} (6.12) 

for a constant C2 independent of t, where x^^^ = Sqhtbt{Sqhtbt+rqhtbt)+hf+6^/bf+Sn, 
X4*i = {ht+'rqhtbt)/ht and xiji = ht + ^qhtbt/h- Note that ht and 6t decreasing with 
i, by (C5) we have Sqhtbt/ht+i < ^qm/h — 0. It follows that en^"*^^ = \n^^^ /h'^.^i 
0, 5^+^^ = 0{rqh^bt) and ^ 0. Recursing (6.10) and (6.12), it follows 

that 

4°°^ = Oixtri + Xl?xi?} = 0{h' + Sq^iSq^ + + 6^) + <5^/6^ + Sr.} 

and en°°'* = 0{5qfit))- This is the first part of Theorem 3.1. By (6.11) and the 
equations above, write 

S(oo) = {Bo + Vn}^t\Bo + 7jny + 0{h^ + Sqm{dqm>+-b^) + Sl/d^}, 

where rjn = C'ig(A^°°V' = 0{h'' + 5q^{dq^ + 5^) + S^/d^ + Sn}. Note that 
^(L)^B(^) (^) = and thus Bj^^rin = 0. We have A„ =^ (Sq + VnV{Bo + ??n) = 
Iq + 0{dl). Let = {-Bo + r]n}A.n^^^. It follows that 

S(oo) = %Ai°°^»7n +C?{^^ + '^.?^('^.m,+5^)+^n/5'}■ 



29 



Let BdOPG be the first q eigenvectors of S(oo)- By Lemma 3.1 of Bai et al (1991), 
we have 

BdOPoBjoPG - BoB^ = BoVn + VnB^ + + 6gm{Sgm + 6l/d^}. (6.13) 
By Lemma 6.7 and (C5), we have 

n 

vn = n-2^p(/Bo(^i))/>(/.(n))4:^^(^i,n)v^56(So^,-,n)(Ai°°^)-' 
j,k=i 

n 

= n-i^p(/Bo(X,))p(/,(y,))<(^0^.„(^0C7(Ai°°))-' + 0{r,^V}, 

i=l 

where Q = ^g^{B'^Xi,Yi)U{Yi) - E{^gt,{B'^Xi,Yi)U{Yi)\B'^Xi}. Let Q = 
^fiYi\BlXi)U{Yi) -E{s7fiYi\BlXi)U{Yi)\B'^Xi}. As 6 ^ 0, we have Ai°°) ^ 
Mo almost surely, where Mq is defined in (C3). By calculating the mean and co- 
variance matrix, we have 

n 
i=l 

It follows from the two equations above and the conditions in the Theorem for the 
bandwidths 

//, 

Vn = n-'Y,pifBo{Xi))p{fAm^to(^i>BMi)ClM^' + op{n-'/^). (6.14) 

i=l 

After vectorizing r/„, the second part of Theorem 3.1 follows from (6.13), (6.14) and 
the central limit theorem. □ 
Proof of Theorem 3.2 Consider the initial estimator B^^i^ in (6.6). Let Q = 
B^^^Bq. For simplicity, we assume Q = Iq, otherwise, we may use basis BqQ and 
consider the expansion in Lemmas 6.3, 6.4 and 6.5 at {BqQY^x. Let 5^^ be the 
consistency rate of the estimator in the t'\h iteration. Write ^(Sq) = {Iq (8) Bo)£{Iq). 
By the definition of in Lemma 6.4, it follows 

{Iq ® B)'^ Db = 0, Iq®B = Iq®BQ + 0{8B), {Iq ® Bq^ ^^{Bq) = Q. (6.15) 

By the definition of the Moore-Penrose inverse we have D~^Db = Iq®{BB^), where 
{B,B) is a p X p orthogonal matrix. By Lemmas 6.4, 6.5 and (6.1), for every 
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in B = {B:\B- Bo\ < ^g^}, if s''^' /ht ^ we have 

b(*+l) = (/^0 5o){W+O(4*^)} + ^*(t)W5(t))-W} + ^^J)^n(5o) 

+0{At + (ht + Sgh,bJht)~S^S}^ (6-16) 

where = hf + {hj + bi + 5,^,6,) + ^l/b^, ^1*^ = {A^ + {d,h,bjht + ht)s'i>}/hl 
D^t) = Db^^) and ^^t) = Iq® (-^(t)-^(T)) = * + > where * = Jq (g) (SoSj") is a 
projection matrix and (5o, Bq) is a p x p orthogonal matrix. We have 

A1(b(*+i)) ^ 5oAW + l>i(^-{£(S(,))-£(5o)}) + ^A^(i^J)$n(5o)) 
where A^*^ = + 0(cn'*) and M.{.) is defined in section 2.2. Note that 

where 5n = 5n + If c4*^ = almost surely, then by Step 3 

= So + - £(i?o)}) + \m{D+^^^{B^)) 

= Bo + ^Min^iB^t)) - KBo)}) + 0{Sn + At + {ht + Sgh,bJht)~6^S}- (6-17) 

By (C5) and (6.6), we have Sqh^b^/hf < Sqf^/h^ — 0, /h\ and dk^ 
almost surely. Thus (6.17) holds for t = 1. By assumption (C5), it follows that 
^bV^2 = o(l) and Cn^ = o(l) almost surely. Thus (6.17) holds for t = 2. Recurring 
the formula, we have 

A more detailed deduction was given in Xia, Tong and Li (2002). Therefore, the 
first part of Theorem 3.2 follows immediately. By the first equation of (6.17) with 
t = oo and Lemma 6.5, we have 

Bioo)-Bo = ^A^(*{£(S(oo))-^(i?o)}) + ^A^(l?Jo)^-(^o)) 

+Op{h'' + {h" +t>'' + 6gf^)dgra,}. 
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Multiplying both sides by Bq, by (6.15) we have 

bJb^^^-i = Op{h'' + {h^+d^ + 5gf^)6gf^}. 

It follows that 

^{oo)^M^o-Bo = ^Mm^{B^oo))-£{Bo)}) + lM{D+^^MBo)) 

+Op{fi' + {h''+d'' + Sgr^)dq^}. 

Note that ^i^J^) = D+^^ + Op {6^^^). We have 

£(i?(^)i?J,)i?o) - ^(^o) = D+^^^niBo) + Op{h^ + {h'' + 5gnb)Sgm}- 

This is the second part of Theorem 3.2. □ 

6.3 Auxiliaries 

Lemma 6.6 Suppose m„(x,Z),n = 1,2, • • • , are measurable functions of Z with 
index x ^ ^"^^ where d is an integer, such that (I) |m„(x, -2^)1 < M{Z) with 
E{M''{Z)) < oo for some r > 2; (II) sup^ £^|mn(x, < and (III) |m„(x,^) - 
m„(x',^)| < |x-x'l"'""'G(Z) with some ai,a2 > and £"10(^)1 < oo. Suppose 
{Zi,i = 1, • • • , n} is a random sample from Z. If a„ = cn~^ with < S < 1 — 2/r 
and c > 0, then for any positive ao we have 

n 

' Y^imnix, Zi) - Erunix, Z,)} = 0{(a„ log n/n)^/2| 



n 

1=1 



sup 

|x|<n«o 

almost surely. 

Proof of Lemma 6.6 The "continuity argument" approach is used here. See, 
e.g. Mack and Silverman (1982) and Hardle et al (1993). Note that P„ {|x| < 
n°o} is bounded and its Borel measure is less than cin""'^ for some constant ci. 
There are n"^* (04 > a^d + (1 + a2)dla\) balls Bn^ centered at Xn^j ^ < k < n°'*, 
with diameter less than C2n~(-'^+"^2)/ai^ g^ch ^Jia^ Ui<jt<„a4 -Bnfc- It follows that 

sup \-y2{mn{x,Zi) - Emn{x,Zi)} 
I 1 " 

l<K<n™4 I n 

1=1 



32 



1 " 

+ max sup - V {mn(x,^i) - "T'„(xnfe,^i)} 

-E{mn{x, Zi) - mn{Xnk,Zi)} 
def 

= max \Rn,k,i\+ max sup \Rn,k,2\- (6.18) 

l<fc<n"4 l<A;<n«4 -^gB„^ 

By condition (III) and the definition of -B„^, we have 

, max sup \mn{x,Zi)-mn{Xnk,Zi)\ < max sup n"^!;^-;^ |°iG(Zi) 

< C3n-^G{Zi). 

By the strong law of large numbers, we have 



max sup \Rn,k,2\<C4n-^y2{G{Zi)+EG{Zi)} = 0{n-^) (6.19) 

almost surely. Let = (na„/ log r^)^/^ m^(xn^, .Zj) = runixuk^ Zi)I{\M{Zi)\ > r„} 
and m^(xnfe, -^i) = mn(xnfc, -^i) - m°(xnfc, -^i)- Write 

^.^.1 = - E [<(Xn„ ^0 - E{m"^{xn„Zi)}] in„i, (6.20) 

i=l i=l 

where ^n^,j = mf^^Xuk^Zi) - E{mj^{xnk^ ^i)}. By the truncation, it follows that 

E\m"^{Xn„Zi)\ <T-^+'E\M{Z,W. 
If a„ = cn""^ with < (5 < 1 — 2/r, we have 

n 

n-i| ^i?m°(xn„^i)| < E\M{Z^)\^T-^+' = o({a„ log(n)/n}V2). (6.21) 

i=l 

Again by the truncation, we have 

J2 \<{xn„Zi)\ < J2 mzmmzi)\ > r„) < r^+^x; |M(z,)r/(iM(z,)i > r„). 

i=l i=l 1=1 

For fixed T, by the strong law of large numbers, we have 



n 

-1 

n 

1=1 



^ \M{Z,)\^I{\M{Z,)\ > T) ^ E{|M(Zi)ri(|M(Zi)| > T)} 
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almost surely. The right hand side above is dominated by E{\M{Zi)\^} and — as 
T — oo. Note that increase to oo with n. For large n such that r„ > T, we have 



n 

def _i 



Cn =' J2 \M{ZiWl{\M{Zi)\ > Tn) < n"! \M {ZiW I {\M (Z^)] >T)^0 



i=l i=l 

almost surely as T — > cxd. It follows 



l<k<n°'i ' 
1=1 

almost surely. By condition (II), we have 

n 

max Var(>^£„. j) < n max E{ml,{~Xnu, 



l<fe<n"4 l<fe<n"4 



< n max E{mn{Xnk, ^i)} < c^nan = Ni. (6.23) 

l<fc<n"4 

By the condition on a„ and the definition of ^n^.i; we have 

max < ceT^ = ceinaj log n) ^2 N2. (6.24) 

l<fc<n™ 

Let iV3 = C7(na„logn)-'^/2 -^^i^h Cy > 2(0:4 + 2)(c5 + cec/). By the Bernstein's 
inequality (cf. DE LA Pena, 1999), we have from (6.23) and (6.24) that 

P(lE?„.,.l>iV3) < 2exp(5^^^^-^j 

< 2exp{-c?logn/(2c5 + 2c6C7)} 

It follows that 

oc n 00 n 

E P^(,<T<\^.. I E ^--^1 > ^3) < E n- ^ max ^ Pr(| ^n,,! > ATa) < 00. (6.25) 

n=l i=l n=l i=l 

By the Borel-Cantelli lemma (cf. Chow and Teicher, 1978, p.60), we have 

n 

,i?|^«jEW^I = ^TO (6.26) 

i=l 

almost surely. Combining (6.20), (6.21), (6.22) and (6.26), we have 



max \Rn,k,i\ = 0{(a„log(n)/n)V2} (6.27) 
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almost surely. Lemma 6.6 follows from (6.18), (6.19) and (6.27). □ 
For any function G{Xi,Yi,Xj, Yj,Xk,Yk) (or G{Xj, Yj,Xk,Yk)), we introduce 
a projection operator Ej. as follows. 

EkG{Xi, Yi, Xj,Yj,Xk, Yfe) = E{G{Xi, Yi, Xj,Yj,Xk, Yk)\Xi, Yi, Xj,Yj}. 

Lemma 6.7 Let A = {A : A = /«} with 1 < k < p. Suppose go{y), gi{x), g2{x) 
are bounded continuous functions. If conditions (C2) and (C4) hold with B replaced 
by A for all A e A, then 

n 

^ Kh{A^Xij)gi{Xi)g2iXj)goiYk)vgb{B^Xj,Yk)eb,iiYk) 

i,j,k=l 

n 

= n-' EjEk{Kh{A^ Xij) y gb{B^ Xj ,Yk)eb,i{Yk)} + 0{<i,hb\A G A), 
1=1 

where <jK/ib = 6^h'''^b~'^ + (5^^^ + and the first term on the right hand side is 

Proof of Lemma 6.7 For easy of exposition, we consider gk = l,k = 0,1,2 
only. Let An{A) be the left hand side of the equation in the lemma. Let iPi^{s) = 
f c^p{is^u)K{u)du and ^Pjj{t) = {2tt)^^ f cxp{itv)H{v)dv be the Fourier 
transformations, where i is the imaginary unit. It follows from the inverse Fourier 
transformation that gb{u,y) = b'^ J if„{t')e-'^'y'^E{e'*'^/\ X = u}dt' . Thus 

Vgf,iBjXj,Yk) = b-^ I <pJt')vUBjXj)e-'''^'^/''dt', (6.28) 

where V9b{u) = dE{e'^'^/''\B^ X = u)/du. We have 

\ r " 

An(^) = ^^JV'At') Y i^h{A'^^^)'^9b{B^^j)-Ej[Kn{A'^Xi,) 

i,j,k=l 

X V ~gb{BjX,)]}{ei„{Yk)e-''''''^/' - £;fcK,(n)e-'*'^'=/^]}dt' 
I "^"^^'^ ^ Ej[K^{A'^Xij)v~g,{BjXj)] 

i,k=l 

x{e6,.(n)e-'*'^'=/^ - i^fch,.(n)e-'*'^'=/^]}di' 
^^b f ^-^^'^ ^ EkMYk)e-^''''>'l']{Kt,{A'X,,)y~g,{BlX,) 

-Ej[Kh{A^Xi,)^UB'^X^)]}dt' 
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1 r " 

1=1 

An,i{A) + An,2{A) + A„,3(^) + A„,4(A). (6.29) 



By the inverse Fourier transformation, it follows that Kh{A^Xij) = h J fii{s) 
^-is-^A^Xii/h^^ and Hb{Yi - Yk) = h'^ J ip„(t)e-<^'-^''^/^dt . Thus 



where 

m,,n{A, s, t, t', X„ ¥,) = e-"^"^-/'' y gb{B^X,) - £;[e-^^^^^/'» y 9b{BjXi)], 
m2,n{A, s, t, t', X,, Yi) = e^(t~m/^ _ E[e^(t-m/b^ 

and 

ms,n{A,s,t,t',Xi,Yi) = e-^'^^/' - E{e-''^^/'\Xi). 

By (C2), we have that | y gb{u)\ < / | y fo{y\u)\dy is bounded. For any r > 2, 
it follows that sup^/ E{\y'gi,{BQ Xi)Y < c and that 

sup ^ E\me,,,{A, s, t, t', Xi, < c, £=l,2, 3, 

where c is a finite constant. For any ao > 0, let = {{t,t',s) : \t\ < n°'° , \t'\ < 
n°'o, \s\ < n"o}. By taking % = {A,t,t',s) and a„ = c, we have from Lemma 6.6 

n 

sup n-^\y^me,n{A,s,t,t',Xi,Yi) = 0(5„), £=1,2,3 (6.30) 
almost surely. On the other hand, \m£^n{A, s,t,t' , Xi,Yi)\ is bounded. Thus, 

n 

sup n-^\^me,n{A,s,t,t',Xi,Yi) =0(1), £=1,2,3. (6.31) 
AeA{t,t',s) ' i=i 

By (C4), the Fourier transformation functions and iPu{.) are absolutely inte- 

grable; see Chung (p. 166, 1968). We can choose ao such that 

/ \vAs)\ds = 0{5l), I \^„{t)\dt<0{Sl). (6.32) 
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Partition the integration region in A„^i(A) into two parts, we have from (6.30)- (6.32) 
that 



sup 
AeA 



X |(^^ {s)ipH {t)<fH {t')\dsdtdt' 

X \tpj^ {s)ipjj {t)ipu {t')\dsdtdt' 
= (h'^b^r'Oid^^) I \ipAs)^,{t)ipAt')\dsdtdt' 

+{h%'')-'0{l) [ \^,{s)^,{t)^,{t')\dsdtdt' 

= 0{5lh-%-'^) (6.33) 

almost surely. Let g{Xi) = Ej[Kyi[A^ Xij) y 9b{BQ Xj)\. It is easy to see that 
g{Xi) = 0(1) almost surely. Applying the inverse Fourier transformation to £6,i(lfe) 
and using similar arguments leading to (6.33), we have 

sup A„,2(^) =0{5lb-'') (6.34) 

almost surely. Applying the inverse Fourier transformation to Kh{A^ Xij), similar 
to (6.33) we have 

sup A„,3(^) = 0{5lh-''h-^) (6.35) 

AeA 

almost surely. By (6.28), we have 

n 

A„,4(A) = n-^Y.^3Ek{Kh{A^ Xij) y gb{B^ Xj,Yk)eb^i{Yk)}. 



i=l 



By Lemma 6.6, we have 



sup A„,4(^) = 0{5n) 

AeA 



(6.36) 



almost surely. Finally, Lemma 6.7 follows from (6.33)-(6.36) and (6.29). □ 
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Abstract 

In this paper, we propose two new methods to estimate the dimension- 
reduction directions of the central subspace (CS) by constructing a regression 
model such that the directions are all captured in the regression mean. Com- 
pared with the inverse regression estimation methods (e.g. Li, 1991, 1992; 
Cook and Weisberg, 1991), the new methods require no strong assumptions 
on the design of covariates or the functional relation between regressors and 
the response variable, and have better performance than the inverse regression 
estimation methods for finite samples. Compared with the direct regression 
estimation methods (e.g. Hardle and Stoker, 1989; Hristache, Juditski, Polzehl 
and Spokoiny, 2001; Xia, Tong, Li and Zhu, 2002), which can only estimate the 
directions of CS in the regression mean, the new methods can detect the direc- 
tions of CS exhaustively. Consistency of the estimators and the convergence of 
corresponding algorithms are proved. 

Key words: Conditional density function; Convergence of algorithm; Double- 
kernel smoothing; Efficient dimension reduction; Root-n consistency. 
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1 Introduction 



Suppose X is a random vector in MP and F is a univariate random variable. Let 
-^0 = (/?oi) ■ ■ ■ jPoq) denote a p x q orthogonal matrix with q < p, i.e. BqBq = Iq, 
where Iq is a q x q identity matrix. Given B^X, if Y and X are independent, 
i.e. Y X X\BqX, then the space spanned by the column vectors /?oi)/?02, • • • ,Poq, 
S{Bo), is called the dimension reduction space. If all the other dimension reduction 
spaces include S{Bo) as their subspace, then S{Bo) is the so-called central dimension 
reduction subspace (CS); see Cook (1998). The column vectors /3oij /3o2, • • • , Poq are 
called the CS directions. Dimension reduction is a fundamental statistical problem 
both in theory and in practice. See Li (1991, 1992) and Cook (1998) for more 
discussion. If the conditional density function of y given X exists, then the definition 
is equivalent to the conditional density function of Y\X being the same as that of 
Y\BqX for all possible values of X and Y, i.e. 

/m(y|^) = 4|,T,(2/l4^)- (1-1) 

Other alternative definitions for the dimension reduction space can be found in the 
literature. 

In the last decade or so, a series of papers (e.g. Hardle and Stocker, 1989; Li, 
1991; Cook and Weisberg, 1991; Samarov, 1993; Hristache, Juditski, Polzehl and 
Spokoiny, 2001; Yin and Cook, 2002; Xia, Tong, Li and Zhu, 2002; Cook and Li, 
2002; Li, Zha and Chiaromontc, 2004; Luc, 2004) have considered issues related to 
the dimension reduction problem, with the aim of estimating the dimension reduc- 
tion space and relevant functions. The estimation methods in the literature can be 
classified into two groups: inverse regression estimation methods (e.g. SIR, Li, 1991 
and SAVE, Cook and Weisberg, 1991) and direct regression estimation methods 
(e.g. ADE, Hardle and Stoker, 1991 and MAVE of Xia, Tong, Li and Zhu 2002). 
The inverse regression estimation methods are computationally easy and are widely 
used as an initial step in data mining, especially for large data sets. However, these 
methods have poor performance in finite samples and need strong assumptions on 
the design of covariates. The direct regression estimation methods have much bet- 
ter performance for finite samples than the inverse regression estimations. They 
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need no strong requirements on the design of covariates or on the response variable. 
However, the direct regression estimation methods cannot find the directions in CS 
exhaustively, such as those in the conditional variance. 

None of the methods mentioned above use the definitions directly in searching 
for the central space. As a consequence, tlicy fail in one way or another to estimate 
CS efficiently. A straightforward approach in using definition (1.1) is to look for Bq 
in order to minimize the difference between those two conditional density functions. 
The conditional density functions can be estimated using nonparametric smoothers. 
Obviously, this approach is not efficient in theory due to the "curse of dimension- 
ality" in nonparametric smoothing. In calculations, the minimization problem is 
difficult to implement. People have observed that the CS in the regression mean 
function, i.e. the central mean space (CMS), can be estimated much more efficiently 
than the general CS. See, for example, Yin and Cook (2002), Cook and Li (2002) 
and Xia, Tong, Li and Zhu (2002) . Motivated by this observation, one can construct 
a regression model such that the CS coincides with the CMS space in order to re- 
duce the difficulty of estimation. In this paper, wc first construct a regression model 
in which the conditional density function fY\x{y\^) is asymptotically equal to the 
conditional mean function. Then, we apply the methods of searching for the CMS to 
the constructed model. Based on the discussion above, this constructive approach 
is expected to be more efficient than the inverse regression estimation methods for 
finite samples, and can detect the CS directions exhaustively. 

In the estimation of dimension reduction space, most methods need in one way or 
another to deal with nonparametric estimation. In terms of nonparametric estima- 
tion, the inverse regression estimation methods employ a nonparametric regression 
of X on y while the direct regression estimation methods employ a nonparametric 
regression of Y on X. In contrast to existing methods, the methods in this pa- 
per search for CS from both sides by investigating conditional density functions. 
A similar idea appeared in Yin and Cook (2005) for a general single-index model. 
To overcome the difficulties of calculation, we propose two algorithms in this pa- 
per using a similar idea to Xia, Tong, Li and Zhu (2002). The algorithm solves 
the minimization problem in the method by treating it as two separate quadratic 
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programming problems, which have simple analytic solutions and can be calculated 
quite efficiently. The convergence of the algorithm can be proved. Our constructive 
approach can overcome the disadvantages both in inverse regression estimations, 
requiring a symmetric design for explanatory variables, and also the disadvantage 
in direct regression estimation, of not finding the CS directions exhaustively. Sim- 
ulations suggest that the proposed methods have very good performance for finite 
samples and are able to estimate the CS directions in very complicated structures. 
Applying the proposed methods to two real data sets, some useful patterns have 
been observed, based on the estimations. 

To estimate the CS, we need to estimate the directions Bq as well as the di- 
mension q of the space. In this paper, however, we focus on the estimation of the 
directions by assuming that q is known. 

2 Estimation methods 

As discussed above, the direct regression estimations have good performance for 
finite samples. However, it cannot detect exhaustively the CS directions in com- 
plicated structures. Motivated by these facts, our strategy is to construct a semi- 
parametric regression model such that all the CS directions are captured in the 
regression mean function. As we can see from (1.1), all the directions can be cap- 
tured in the conditional density function. Thus, we will construct a regression model 
such that the conditional density function is asymptotically equal to the regression 
mean function. 

The primary step is thus to construct an estimate for the conditional density 
function. Here, we use the idea of the "double-kernel" local linear smoothing method 
studied in Fan et al (1996) for the estimation. Consider H{,{Y — y) with y running 
through all possible values, where H(v) is a symmetric density function, & > is a 
bandwidth and Hi,{v) = h~^H{y/h). If 6 — > as n — oo, then from (1.1) we have 

m,{x,y) =^ E{H,{Y - y)\X = x) = E{H,{Y - y)\BlX = bIx) ^ /y|sTx(yl4^)- 

See Fan et al (1996). The above equation indicates that all the directions can be 
captured by the conditional mean function mb{x,y) of Hb(Y — y) on X = x with 
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X and y running through all possible values. Now, consider a regression model 
nominally for H}j{Y — y) as 

Hb{Y -y) = m,{X,y)+eMX), 

where = H^{Y ^y)-E{Hb{Y -y)\X) withEeMX) = 0. Let y) = 

E{Hb{Y - y)\BlX = B^x). If (1.1) holds, then mb{x,y) = gb{B^x,y). The model 
can be written as 

Hb{Y -y) = gb{BjX, y) + e,iy\X). (2.1) 

As 6 ^ 0, we have gj,{BQX,y) —> IyibJ xivlBo x). Thus, the directions Bq defined 
in (1.1) are all captured in the regression mean function in model (2.1) if y runs 
through all possible values. 

Based on model (2.1), we propose two methods to estimate the directions. One 
of the methods is a combination of the outer product of gradients (OPG) estima- 
tion method (Hardle, 1991; Samarov, 1993; Xia, Tong, Li and Zhu, 2002) with the 
"double-kernel" local linear smoothing method (Fan et al, 1996). The other one is 
a combination of the minimum average (conditional) variance estimation (MAVE) 
method (Xia, Tong, Li and Zhu, 2002) with the "double-kernel" local linear smooth- 
ing method. The structure adaptive weights in Hristache, Juditski and Spokoiny 
(2001) and Hristache, Juditski, Polzehl and Spokoiny (2001) are used in the estima- 
tions. 

2.1 Estimation beised on outer products of gradients 

Consider the gradient of the conditional mean function mb{x, y) with respect to x. 
If (1.1) holds, then it follows 



dini,(x,y) _ dgb{B^^^ x,y) 



Bo^gb{B^x,y), (2.2) 



dx dx 

where \/g^{vi, ■■■,Vq,y) = {\/ig^{vi, ■ ■ ■ ,Vq,y),- ■ ■ , S/ggti'"!^ ■■■,Vq, y)V with 

Vkg^ivi,--- ,Vq,y) = -K—gb{vi,---,Vq,y), k = l,2,---,q. 

Thus, the directions Bq are contained in the gradients of the regression mean func- 
tion in model (2.1). One way to estimate Bq is by considering the expectation of 



the outer product of the gradients 

It is easy to see that Bq is in the space spanned by the first q eigenvectors of the 
expectation of tlic outer products. 

Suppose that {{Xi,Yi},i = 1,2,- ■■n} is a random sample from {X,Y). To 
estimate the gradient dmi,{x,y)/dx, we can use the nonparametric kernel smoothing 
methods. For simplicity, we adopt the following notation scheme. Let Kq{v'^) be a 
univariate symmetric density function and define K{vi, ■ ■ ■ , Vd) = Kq^v^ + • • • + v"^) 
for any integer d and Kh{u) = h~'^K{u/h), where d is the dimension of u and ^ > 
is a bandwidth. Let Hi,,i{y) = Hb{Yi — y), where H{.) and b are defined above. For 
any {x,y), the principle of the local linear smoother suggests minimizing 

n 2 

1 J2 {Hb,i{y) - « - b'iXi - x)] KhiXi,) (2.3) 



n 

1=1 



with respect to a and b to estimate mi,{x,y) and dmh{x,y)/dx respectively, where 
= Xi — x. See Fan and Gijbels (1996) for more details. For each pair of (Xj, Y^), 
we consider the following minimization problem 

n 2 

(«jfe, bjk) = arg min V \Hbi{Yk) - ajk - bjkXij Wij, (2.4) 

where Xij = Xj — Xj and Wij = Kfi{Xij). We consider an average of their outer 
products 

n n 

k=i j=i 

where pj^ is a trimming function introduced for technical purpose to handle the 
notorious boundary points. In this paper, we adopt the following trimming scheme. 
For any given point {x,y), we use all observations to estimate its function value and 
its gradient as in (2.3). We then consider the estimates in a compact region of {x, y). 
Moreover, for those points with too few observations around, their estimates might 
be unreliable. They should not be used in the estimation of the CS directions and 
should be trimmed off. Let p{-) be any bounded function with bounded second order 
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derivatives on M such that p{v) > ii v > loq; p{v) = ii v < luq for some smah 
ujo > 0. We take pjk = p{f{Xj))p{fY{Yk)), where f{x) and are estimators of 

the density functions of X and Y respectively. The CS directions can be estimated 
by the first q eigenvectors of S. 

To allow the estimation to be adaptive to the structure of the dependency of Y 
on X, we may follow the idea of Hristache et al (2001) and replace Wij in (2.4) by 

Wij = Kh{ty^Xij), 

where S^/^ is a symmetric matrix with (S^/^)^ = S. Repeat the above procedure 
until convergence. We call this procedure the method of outer product of gradient 
based on the conditional density functions (dOPG). To implement the estimation 
procedure, we suggest the following dOPG algorithm. 

Step 0: Set S(o) = Ip and t = 0. 

Step 1: With Wij = K^iJ^^^-^ ^ij): calculate the solution to (2.4) 



^jk ' i=\ 



{E-.<-«)U)a,) y 



where ht and ht are bandwidths (details are given in (2.6) and (2.7) below). 
Step 2: Define = p(/W(X,))p(/W(n)) with 



where X^i^\k = 1, - ■ ■ ,p, are the eigenvalues of T,^^^^ and p = J Ko{ J2 "^l) 
^^(t)^. dvk- Calculate the average of outer products 



j,k=l 
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Step 3: Set t := t + 1. Repeat Steps 1 and 2 until convergence. Denote the 
final value of by S(oo)- Suppose the eigenvalue decomposition of '^[oo) is 
rdiag{\i, ■ ■ • , Ap)r^, where Ai > • • • > Ap. Then the estimated directions are 
the first q columns of F, denoted by BdOPG- 

In the dOPG algorithm, f^Xy) and p\x), t > 0, are the estimators of the 
density functions of Y and B^X respectively. A justification is given in the proof 
of Theorem 3.1 in Section 6.2. In calculations, the usual stopping criterion can be 
used. For example, if the largest singular value of E^^) — £(t+i) is smaller than 10~^ 
then we stop the iteration and take '^(t+i) ^ the final estimator. The eigenvalues 
of S(oo-) can be used to determine the dimension of the CS. However, we will not go 
into the details on this issue in this paper. In practice, we may need to standardize 
Xi = {Xii, ■ ■ ■ , XipY by setting Xj := S~^^'^{Xi — X) and standardize Yi by setting 
Yi := (Yi-Y)/^, where X = n"! E7=i X^ and = n'^ T.7=liX^-XKX^-X)' , 
Y = n-^ Yli=i Yi and = n"^ E"=i(^i - ^)^- Then the estimated CS directions 
are the first q columns of TS~^^^. 

2.2 MAVE based on conditional density function 

Note that if (1.1) holds, then the gradients dmb{x, y)/dx at all (x, y) are in a common 
g-dimensional subspace as shown in equation (2.2). To use this observation, we can 
replace h in (2.3), which is an estimate of the gradient, by Bd{x,y) and have the 
following local linear approximation 

n 

n-' Y,{Hb,i{y) - « - d^B^i^i - x)}^Kh{Xi^), 

i=l 

where d = d(x,y) is introduced to take the role of sjgi,{BQ x,y) in (2.2). Note that 
the above weighted mean of squares is the local approximation errors of Hh^i{y) by 
a hyperplane with the normal vectors in a common space spanned by B. Since 
B is common for all x and y, it should be estimated with aims to minimize the 
approximation errors for all possible Xj and Y^. As a consequence, we propose to 
estimate Bq by minimizing 

n n n 

E E Pi'' Ei^M(^fe) - - d]kB^Xij}^Wij (2.5) 

k=l j=l i=l 
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with respect to ajk,djk = {djki, ■ ■ • jdjkqy ,j,k = 1, ...,n and B : B B = Iq, where 
Pjk is defined above. This estimation procedure is similar to the minimum average 
(conditional) variance estimation method (Xia, Tong, Li and Zhu, 2002). Because 
the method is based on the conditional density functions, we call it the minimum 

average (conditional) variance estimation based on the conditional density functions 
(dMAVE). 

The minimization problem in (2.5) can be solved by fixing {ajk,djk),j,k = 
l,...,n, and B alternatively. As a consequence, we need to solve two quadratic 
programming problems which have simple analytic solutions. For any matrix B = 
{Pi, - ■ ■ , Pd), we define operators £{.) and M{.) respectively as 

^(B) = (/?!, •••,/3j)T and Mi£{B)) = B. 

We propose the following dMAVE algorithm to implement the estimation. 

Step 0: Let be an initial estimator of the CS directions. Set t = 1. 

Step 1: Let B = calculate the solutions of {ajk,djk),j,k = l,...,n, to the 

minimization problem in (2.5) 




where ht and bt are two bandwidths (details are discussed below). 

Step 2: Let = ^(/b^,, (X,))p(/(*)(n)) with ff{y) = n-' Y^^, H,,4y) and 
fB(t){x) = Y,i=i Kht{Bjf-^Xix)- Fixing ajk = afl and djt = dfl, calculate 
the solution of B or e{B) to (2.5) 

k,j,i=l 

n 

X E pflKH,iBl)X,j)xg{H,^4Yk)-afl}, 

k,j,i=l 

where xg = d^^Xij. 
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Step 3: Calculate A(t+i) = {X(b(*+i))}^X(b(*+i)) and = M(b(*+^)A~^f). 

Set t -.= 1 + 1 and go to Step 1. 

Step 4: Repeat steps 1-3 until convergence. Let -B^t^,) be the final value of B(fy 
Then our estimators of the directions are the columns in B(^^y denoted by 

BdMAVE- 

The dMAVE algorithm needs a consistent initial estimator in Step to guarantee 
its theoretical justification. In the following, we use the first iteration estimator of 
dOPG, the first q eigenvector of as the initial value. Actually, any initial 

estimator that satisfies (6.6) can be used and Theorem 3.2 will hold. Similar to 
dOPG, the standardization procedure can be carried out for dMAVE in practice. 
The stopping criterion for dOPG can also be used here. 

Note that the estimation in the procedure is related with nonparametric esti- 
mations of conditional density functions. Several bandwidth selection methods are 
available for the estimation. Sec, e.g. Silverman (1986), Scott (1992) and Fan ct al 
(1996). Our theoretical verification of the convergence for the algorithms requires 
some constraints on the bandwidths although we believe these constraints can be 
removed with more complicated technical proofs. To ensure the requirements on 
bandwidths can be satisfied, after standardizing the variables we use the following 
bandwidths in our calculations. In the first iteration, we use slightly larger band- 
widths than the optimal ones in terms of MISE as 

1 1 
^0 = cqu , bo = Con po+s , (2.6) 

where po = max(p, 3). Then we reduce the bandwidths in each iteration as 

i_ i_ _i 

/ij+i = max{rn/it, Con 9+"}, = max{rn6t, con 9+3, con s} (2.7) 

for t > 0, where r„ = n^^/^'^^°~^^^\ cq = 2.34 as suggested by Silverman (1986) if 
the Epanechnikov kernel is used. Here, the bandwidth b is selected smaller than h 
based on simulation comparisons. 

Fan and Yao (2003, p. 337) proposed a method, called the profile least-squares 
estimation, for the single-index model and its variants by solving a similar mini- 
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mization problem as in (2.5). The method is also possible to be used here for the 
estimation of Bq in (2.1). 

3 Asymptotic results 

To exclude the trivial cases, we assume that p > 1 and q > I. Let fo{y\vi, ■ ■ ■ ,Vq), 
/o(fi, • • • , Vq) and /y(y) be the (conditional) density functions of Y\BJX, BJX 
and y respectively. Let /9o(x,y) = p(/o(-B([x))p(/y(y)), \/fo{y\vi,---,Vq) = 
idfo{y\vi, - ■ ■ ,Vq)/dvi, - ■ ■ ,dfoiy\vi, - ■ ■ ,Vq)/dvqy , hb{u) = E{X\B^X = u) and 
wb{u) = E{XX~^\B^X = u}. For any matrix A, let \A\ denote its largest singular 
value, which is same as the Eculidean norm if ^ is a vector. Let Bq : p x {p — q) 
be such that {Bq, Bq)^ {Bq, Bq) = Ip. We need the following conditions for (1.1) to 
prove our theoretical results. 

(CI) [Design of X] The density function f{x) of X has bounded second order deriva- 
tives on MP; E\X\^ < oo for some r > 8; functions hb{u) and wb{u) have 
bounded derivatives with respect to u and B for B in a small neighbor of Bq: 
\B - Bq\ <S for some S > 0. 

(C2) [Conditional density function] The conditional density functions fvixivl^) 
and fY\B^x (yW) have bounded fourth order derivatives with respect to x, 
u and B for S in a small neighbor of Bq; the conditional density function of 
/bTxyibTxC^' 1 1 V fo{y\'^)\dy are bounded for all u,y and v. 

(C3) [Efficient dimension] Matrix Mq = J po{x,y) y HyW^x) \f fQ{y\Blx)f{x) 
/y {y)dxdy has full rank q. 

(C4) [Kernel functions] Kq{v'^) and H{v) are two symmetric univariate density func- 
tions with bounded second order derivatives and compact supports. 

(C5) [Bandwidths for consistency] Bandwidths Hq = cin~^^ and 6o = C2n~^'' where 
< rh,n < l/(Po + 6), pq = max{p,3}. For t > 1, ht = max{r„/tt_i, ^} 
and bt = max{r„6t_i,S} where r„ = n"'"'"''^, h = c^nT'^h,^ = C4n~'^b with 
< r^, < l/{q + 3), and ci, C2, C3, C4 are constants. 

In (CI), the finite moment requirement for \X\ can be removed if we adopt 
the trimming scheme of Hardle et al (1993). However, as noticed in Delecroix et al 
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(2004), this scheme caused some technical problems in the proofs. Based on assump- 
tions (C2) and (C4), the smoothness of gb{u,y) is implied. Lower order of smooth- 
ness is sufficient if we are only interested in the estimation consistency. The second 
order differentiable requirement in (C4) can ensure the Fourier transformations of 

the kernel functions being absohitely integrable; see Chung (p. 166, 1968). The pop- 
ular kernel functions such as Epanechnikov kernel and quadratic kernel are included 
in (C4). The Gaussian kernel can be used with some modifications to the proofs. 
Condition (C3) indicates that the dimension q cannot be further reduced. For ease 
of exposition, we further assume that /xoi? = / H{v)dv = 1,ij,2H = J v^H{v)dv = 1, 
HQq = J K{vi, ■■ , Vq)dvi ■■■dvq = 1 and fi2q = J K{vi, ■ ■■ , Vq)vldvi ■■■dvq = 1; 

"1^/2 1/2 1 

otherwise, we take H{v) := H{v / T2jj ) / and K{vi ,Vq) = /Xq^ K{vi / -y/JJoq, ■ ■ , 
Vq/-s/Jl2q) I \/Wiq- The bandwidths satisfying (C5) can be found easily. For exam- 
ple, the bandwidths given in (2.6) and (2.7) satisfy the requirements. Actually, a 
wider range of bandwidths can be used; see the proofs. Let vb{x) = /xb(B^x) — x, 

wb{x) = wb{B^x) — fiB{B^ x) fij^{B^ x) and fo{x) = fQ{B^x). For any square ma- 
trix A, A^^ and A'^ denote the inverse (if it exists) and the Moore-Penrose inverse 
matrices respectively. 

Theorem 3.1 Suppose conditions (C1)-(C5) hold. Then we have 

\BdOPGBjoPG - BoB^l = Oih"" + + 6q^d^ + d^/d^ + n-V^) 

in probability as n ^ oo, where (5^^) = {nh% / log n)~^/^ and 5n = (log n/n)^/^. If 
^'^ + ^qhb + ^qKb^'^ + <^n/^^ = o(n~^/^) Can be satisfied, then 

V^{e{BaoPGBjoPGBo) - iiBo)} ^ N{0, Wo), 

where 

Wo = Var[poiX,Y)M^\s7foiY\B],X)UiY)-E{^fo(Y\B],X)UiY)\X}) 

The first part of Theorem 3.1 indicates that BaopG is a consistent estimator of 
an orthogonal basis, BqQ with Q = Bq BdoPG, in CS and \BdOPG — BqQ\ = 0{h'^ + 
S'^f^+6qM,t)'^+Sn/t>^+n^^^^) in probability. See Bai et al (1991) and Xia, Tong, Li and 
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Zhu (2002) for alternative presentations of the asymptotic results. If the bandwidths 
in (2.7) are used, then the consistency rate is 0(n~^/'^5+^)+^/(*+^) logn + n~^/^) in 
probability. Faster consistency rate can be obtained by adjusting the bandwidths. 
The convergence of the corresponding algorithm is also implied in the proof in section 
6. If < 3, then the condition for the normality can be satisfied by taking 

If we use higher order polynomial smoothing, it is possible to show that the root-n 
consistency can be achieved for any dimension q. See, e.g. Hardle and Stoker (1989) 
and Samarov (1993), where the higher order kernel, a counterpart of the higher 
order polynomial smoother, was used. However, using higher order polynomial 
smoothers increases the difficulty of calculations while the improvement of finite 
sample performance is not substantial. 

Theorem 3.2 If conditions (C1)-(C5) holds, then 

\BdMAVEBjMAVE - BqEJ \ = 0{fl' + + dg^h'' + d^/h^ + n'^l^} 

in probability as n — oo. If + + Sgf^d^ + (5^/5^ = o{n~^/^) can be satisfied, 
then 

V^{iiBdMAVEBjMAVEBo) - ^(^o)} ^ N{0,D+EoD+), 

where Dq = J po{x,y) V /o(y|-B([x) sffoiy\Blx) (g) {ug^{x)ul^{x)] fo{x)fY{y)dxdy 
and 

Eo = Var\p^{X,Y){s/fo{Y\BlX)U{Y) - E{s/ U{Y\BIX)U{Y)\X}) ® u^^{X)]. 

The proof of Theorem 3.2 is given in section 6. The convergence of the dMAVE 
algorithm is implied in the proof. Similar remarks on dOPG are applicable to 
dMAVE. Moreover, B^mave converges to BqQ, where Q is determined by the initial 
consistent estimator of the directions. For example, Q = B^^^Bq if is used as the 
initial estimator. Similarly, the root-n consistency holds for q < 3. It is possible that 
the root-n consistency holds for g > 3 if higher order local polynomial smoothing 
method is used. In spit of the equivalence in terms of consistency rate for both 
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dOPG and dMAVE, our simulations suggest that dMAVE has better performance 
than dOPG in finite samples. Theoretical comparison of efficiencies between the 
two methods is not clear. In a very special case when q = I and the CS is in the 
regression mean, Xia (2006a) proved that dMAVE is more efficient than dOPG. 

Wc here give some discussions about the requirements on the distributions of 
X and Y. If Y is discrete, we can consider the conditional cumulative distribution 
functions and have FY\x{y\x) = Fy^s^ xivl-^o ^) when Y A. X\Bq X holds. Similar 
to (2.1), we can consider a regression model 

liY <y) = GiB^X,y) + e{y\X), 

where G{Bjx,y) = E{I{Y < y)\X = x} = E{I{Y < y)\BjX = BJx} and 
e{y\X) = I{Y < y) — G{BQX,y). Similar theoretical consistency results are possible 
to be obtained following the same techniques developed here. If some covariates in 
X are discrete, our algorithms in searching for a consistent initial estimator will fail. 

However, if a consistent initial estimator can be found by for example the methods 
in Horowitz and Hardle (1996) and Hristachc, Juditski, Polzehl and Spokoiny (2001) 
and that B^ X has a continuous density function for all B in a neighbor around Bq, 
then our theoretical results in the above theorems still hold. 

4 Simulations 

We now demonstrate the performance of the proposed estimation methods by simu- 
lations. We will compare them with some existing methods including SIR (Li, 1991), 
SAVE (Cook and Weisberg, 1991), PHD (Li, 1992) and rMAVE (Xia, Tong, Li and 
Zhu, 2002). The computer codes used here can be obtained from www.jstatsoft.org/ 
vOl/iOl/ for SIR, SAVE and PhD methods (Courtesy of Professor S. Weisberg) and 
www.stat.nus. edu/~ycxia/ for rMAVE, dOPG and dMAVE. In the following calcu- 
lations, we use the quadratic kernel H{v) = Ko{v^) = (15/16)(1 — u^)^/('u^ < 1) and 
uq = 0.01. The bandwidths in (2.6) and (2.7) are used. For the inverse regression 
methods, the number of slices is chosen between 5 to 30 that is most close to n/ (2p). 
We define an overall estimation error of estimator B : W B = Iq by the maximum 
singular value of BqB^ — BB^; see Li et al (2004). 
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Example 4.1 Consider model 



Y = sign(2XT/3i + e{) log(|2XT/52 + 4 + ssl), 



(4.1) 



where sign(-) is the sign function. Coordinates X ~ N{0,lp), unobservable noises 
£i ~ A^(0, 1) and £2 ~ -^(0,1) are independent. For (3i, the first 4 elements are 
all 0.5 and the others are zero. For P2, the first 4 elements are 0.5,-0.5,0.5,-0.5 
respectively and all the others are zero. A similar model was investigated by Chen 
and Li (1998). In order to show the effect on the estimation performances of the 
number of covariates, we vary p in the simulation. With different sample sizes, 200 
replications are drawn from the model. The calculation results are listed in Table 1. 
To get an intuition about the quantity of estimation errors, Figure 1 shows a typical 
sample of size n = 200 and its estimate with estimation error 0.21. The structure 
can be estimated quite well in the sample. 
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Figure 1: A typical data of size 200 from Example 4-1 with p = 10 to show the quantity of 
estimation error and its graphic performance. The left two panels are plots of y against the 
true CS directions; the right two panels y against the estimated directions using dMAVE. 
The estimated directions are respectively f3i — (0.42, 0.64^, 0.^4, 0.45, -0.01, -0.07, 0.02, 
-0.00, -0.08, 0.01)^ and P2 = (-0.54, 0.43, -0.57, 0.43, 0.01, -O.O4, -0.01, 0.07, -0.05, 
0.07)^ with estimation error 0.21. 



Table 1: Mean (and standard deviation) of estimation errors for Example 4.1 



n p 


dOPG dMAVE rMAVE SIR SAVE PHD 


5 

100 10 
20 


0.25(0.09) 0.22(0.08) 0.43(0.19) 0.29(0.09) 0.87(0.19) 0.72(0.22) 
0.55(0.19) 0.35(0.07) 0.64(0.19) 0.46(0.10) 0.94(0.06) 0.90(0.13) 
0.81(0.13) 0.54(0.10) 0.88(0.12) 0.64(0.11) 0.96(0.06) 0.93(0.07) 


5 

200 10 
20 


0.17(0.05) 0.14(0.04) 0.27(0.13) 0.19(0.05) 0.55(0.26) 0.47(0.15) 
0.32(0.09) 0.24(0.06) 0.46(0.17) 0.30(0.06) 0.96(0.08) 0.73(0.16) 
0.62(0.15) 0.36(0.06) 0.66(0.16) 0.43(0.06) 0.93(0.04) 0.94(0.08) 


5 

300 10 
20 


0.13(0.04) 0.13(0.04) 0.19(0.07) 0.16(0.05) 0.32(0.18) 0.37(0.12) 
0.24(0.06) 0.18(0.04) 0.36(0.16) 0.24(0.05) 0.85(0.17) 0.59(0.15) 
0.48(0.13) 0.28(0.05) 0.55(0.16) 0.35(0.05) 0.92(0.03) 0.84(0.12) 


5 

400 10 
20 


0.11(0.04) 0.11(0.04) 0.21(0.12) 0.14(0.04) 0.22(0.11) 0.31(0.10) 
0.21(0.04) 0.16(0.04) 0.31(0.11) 0.21(0.05) 0.66(0.22) 0.51(0.13) 
0.31(0.06) 0.25(0.04) 0.49(0.15) 0.29(0.04) 0.98(0.04) 0.76(0.14) 
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In model (4.1), the CS directions are hidden in a comphcated structure and are 
not easy to be detected directly by the conditional regression mean function. When 
sample size is large (> 200) and p is not high (= 5), all the methods have accurate 
estimates. As p increases, rMAVE performs not so well because the second direction 
is not captured in the regression mean function; SAVE and PHD also fail to give 
accurate estimates. SIR performs much better in all the situations than SAVE and 
PHD. dOPG has about the same performance as SIR. dMAVE is the best in all 
situations among all the methods. 

Example 4.2 Now, consider the CS in conditional mean as well as the conditional 
variance as in the following model 



(4.2) 



where X = (xi, • • • , xio)^ with xi, • • • , xio ~ Uniform{—\/^, \/3) and e ~ iV(0, 1) 
are independent, /?i = (1, 2, 0, 0, 0, 0, 0, 0, 0, 2)^ /3 and P2 = (0, 0, 3, 4, 0, 0, 0, 0, 0, 0)^ 
/5. For model (4.2), one CS direction is contained in the regression mean and the 
other in the conditional variance. One typical data with size 200 is shown in Figure 
2. Table 2 lists the calculation results of 200 replications. 



>>0 




Figure 2: A typical data with n = 200 from Example ^.2 and its dMAVE estimation. The 
left two panels are plots of y against the true CS directions respectively; the right two panels 
y against the estimated directions respectively with estimation error 0.31. 



Because rMAVE cannot detect the CS directions hidden in the conditional vari- 
ance directly, it has very poor overall estimation performance as listed in Table 2. If 
d = 1, i.e. the regression mean function is monotonic, SIR works reasonably well; if 
d = 2, the regression mean function is symmetric and SIR fails to find the direction 
hidden in the regression mean. As a consequence, its performance is very poor. 
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The performances of SAVE and PHD are also far from satisfactory though they are 
apphcable to the model theoretically. The proposed dOPG and dMAVE perform 
very well and are better than the existing methods listed in the table. 



Table 2: Mean (and standard deviation) of estimation errors for Example 4.2 



d 


n 


dOPG 


dMAVE 


i-MAVE 


SIR 


SAVE 


PHD 


1 


100 
200 
400 


0.57(0.15) 
0.36(0.08) 
0.24(0.05) 


0.44(0.12) 
0.28(0.06) 
0.18(0.04) 


0.85(0.13) 
0.76(0.16) 
0.68(0.15) 


0.63(0.15) 
0.42(0.09) 
0.29(0.06) 


0.93(0.08) 
0.91(0.12) 
0.64(0.16) 


0.99(0.08) 
0.98(0.07) 
0.97(0.07) 


2 


100 
200 
400 


0.63(0.19) 
0.33(0.10) 
0.22(0.05) 


0.46(0.16) 
0.28(0.06) 
0.19(0.04) 


0.85(0.16) 
0.70(0.18) 
0.66(0.19) 


0.96(0.09) 
0.95(0.07) 
0.95(0.09) 


0.90(0.06) 
0.87(0.11) 
0.85(0.12) 


0.91(0.11) 
0.88(0.11) 
0.89(0.11) 



Example 4.3 In this example, we demonstrate the consistency rates of the esti- 
mation methods by checking how the estimation errors change with sample size n. 
Consider model 



Y 



Xl 



+ X3{x-i + Xi + l) + O.le, 



(4.3) 



0.5 + (1.5 + X2)2 

where e ~ A^(0, 1) and X ~ A^(0, /lo) are independent. Model (4.3) is a combination 
of the two examples in Li (1991). For this model, all the theoretical requirements 
for the methods are fulfilled. Therefore, it is fair to use the model to check their 
consistency rates. 
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Figure 3: The calculation results for Example 4-3 using different estimation methods. The 
lines are the mean of estimation errors with different sample size and 200 replications. The 
left panel is the plot of the errors against sample size; the right panel is the errors multiplied 
by root-n against sample size. 



In the left panel in Figure 3, the proposed methods have much smaller estima- 
tion errors than the inverse regression estimations. Because all the directions are 
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hidden in the regression mean function, it is not surprising that rMAVE has the 
best performance. Multiphed by root-n, the errors should keep in a constant level 
if the theoretical root-n consistency is applicable to the range of sample size. The 
right panel suggests that the estimation errors of SIR and SAVE do not start to 
show a root-n decreasing rate for the sample size up to 1000, while PHD, rMAVE, 
dOPG and dMAVE demonstrate a clear root-n consistency rate. 

Example 4.4 In our last example, we consider a model with a very complicated 
structure. Suppose {Xi,Yi),i = 1,2, •••,n, are drawn independently from model 
Y = l3j X/2 + e{l - \l3jX\'^y/^, where {X,e) satisfies {X ~ iV(0,/io),e ~ A^(0, 1) : 
\f3jX\ < 1, \PJX\ < 1,0.5 < WjXfil - £2) + £2 < 1}^ and ^2 are defined in 
Example 4.1. The calculation results based on 200 replications are listed in Table 3. 
Because of the complicated structure as shown in Figure 4, the CS directions are not 
easy to be estimated and observed directly. However, with moderate sample size, 
the proposed methods can still estimate the directions accurately. It is interesting 
to see that SAVE also works in this example. 



Table 3: Mean (and standard deviation) of estimation errors for Example 4.4 



n 


dOPG 


dMAVE 


rMAVE 


SIR 


SAVE 


PHD 


200 
400 
600 
800 
1000 


0.5909(0.29) 
0.2117(0.19) 
0.1148(0.04) 
0.0876(0.03) 
0.0782(0.02) 


0.5089(0.30) 
0.1498(0.10) 
0.1059(0.03) 
0.0862(0.02) 
0.0779(0.02) 


0.9411(0.07) 
0.9573(0.05) 
0.9725(0.03) 
0.9744(0.03) 
0.9671(0.04) 


0.8770(0.12) 
0.8783(0.13) 
0.8758(0.13) 
0.8737(0.14) 
0.8819(0.13) 


0.9242(0.19) 
0.7677(0.18) 
0.5357(0.21) 
0.3657(0.13) 
0.2604(0.06) 


0.9833(0.05) 
0.9789(0.03) 
0.9799(0.03) 
0.9757(0.04) 
0.9789(0.04) 



Based on the simulations, we have the following observations. (1) The existing 
methods (rMAVE, PHD, SIR and SAVE) fail in one way or another to estimate the 
CS directions efficiently, while dOPG and dMAVE are efficient for all the examples. 
(2) dOPG and dMAVE demonstrate very good finite sample performance, even a 
root-n rate of estimation efficiency, while some of the existing methods do not show 
a clear root-n rate in the range of sample sizes investigated. (3) dOPG and dMAVE 
are less sensitive to the number of covariates than PHD, SAVE and SIR. Simulations 
not reported here also suggest that the asymmetric design of X has less effect on 
dOPG and dMAVE than that on the inverse regression estimations. (4) If the CS 
directions are all hidden in the regression mean function, rMAVE is the best and 
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should be used. Otherwise, dOPG and dMAVE are recommended. 
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Figure 4: yl typical data from Example 4-4 with n — 200 and its dMAVE estimation. The 
upper three panels are plots of y against the true CS directions and y — x^/3i/2 against 
the second direction respectively; the lower three panels are plots of y against the estimated 
CS directions (with estimation error 0.32) and y — a:^/3i/2 against the second estimated 
direction respectively. 



5 Real data analysis 

Example 5.1 (Cars data) This data was used by the American Statistical Asso- 
ciation in its second (1983) exposition of statistical graphics technology. The data 
set is available at http://lib.stat.cmu.edu/datasets/cars.data. There are 406 observations 
on 8 variables: miles per gallon (y), number of cylinders {Xi), engine displacement 
(^2), horsepower (X3), vehicle weight (X4), time to accelerate from to 60 mph 
(X5), model year {X^), and origin of a car (1. American, 2. European, 3. Japanese). 

Now we investigate the relation between response variable Y and covariates X = 
{Xi, • • • , Xg)^, where Xi, • • • , Xg are defined above, Xj = 1 if a car is from America 
and otherwise; X^ = 1 if it is from Europe and otherwise. Thus, {Xj,X^) = 
(1,0), (0,1) and (0,0) correspond to American cars, European cars and Japanese 
cars respectively. For ease of explanation, all covariates are standardized separately. 
When applying dOPG to the data, the first 4 largest eigenvalues are 21.1573, 1.6077, 
0.2791 and 0.2447 respectively. Thus, we consider CS with dimension 2. Based on 
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dMAVE, the two directions (coefficients of X) are estimated as (3i = (-0.33, -0.45, 
-0.45, -0.53, 0.14, 0.42, 0.00, -0.02)^ and (32 = (0.00, 0.15, -0.10, -0.23, -0.12, -0.17, 
-0.88, 0.29)^ respectively. The plots of Y against 'pj X and {5^ X are shown in 
Figure 5. 
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Figure 5: The estimation results for Example 5.1 using dMAVE. The two panels are plots 
of Y against the two estimated CS directions respectively. The origins of cars are denoted 
by "■ " for American cars, "x " for European cars, and "o" for Japanese cars. 

Based on the estimated CS directions and Figure 5, we have the following in- 
sights to the data. The first direction highlights the common structure for cars of 
all origins: miles per gallon (Y) decreases with number of cylinders (Xi), engine 
displacement {X2), horsepower {X3) and vehicle weight {X4), and increases with 
the time to accelerate (X^) and model year {Xq). The second direction indicates 
the difference between American cars and European or Japanese cars. 



Example 5.2 (Ground level Ozone) Air pollution has serious impact on the 
health of plants and animals (including humans); see the report of the World Health 
Organization (WHO) (2003). Substances not naturally found in the air or at greater 
concentrations than usual are referred to as "pollutants". The main pollutants 
include nitrogen dioxide (NO2), carbon dioxide (CO), sulphur dioxide (SO2), res- 
pirable particulates, ground-level ozone (O3) and others. Pollutants can be classified 
as either primary pollutants or secondary pollutants. Primary pollutants are sub- 
stances directly produced by a process, such as ash from a volcanic eruption or the 
carbon monoxide gas from a motor vehicle exhaust. Secondary pollutants are prod- 
ucts of reactions among primary pollutants and other gases. They are not directly 
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emitted and thus cannot be controlled directly. The main secondary pollutant is 
ozone. 

Next, we investigate the statistical relation between the level of ground-level 
ozone with the levels of primary pollutants and weather conditions by applying our 
method to the pollution data observed in Hong Kong (1994-1997, http://www.hku.hk 
/statistics/paper/) and Chicago (1995-2000, http://www.ihapss.jhsph.edu/data/data.htm). 
This investigation is of interest in understanding how the secondary pollutant ozone 
is generated from the primary pollutants and weather conditions. Let Y, N,S,P,T 
and H be the weekly average levels of ozone, nitrogen dioxide (NO2), sulphur diox- 
ide (SO2), respirable particulates, temperature and humidity respectively. To in- 
clude the interaction between primary pollutants and weather conditions into the 
model directly, we further consider their cross-products resulting in 15 covariates 
all together, denoted by X. For ease of explanation, all covariates are standard- 
ized separately. For all possible working dimensions, only the first two dimen- 
sions show clear relations with Y. We further calculate the eigenvalues in dOPG. 
The largest four eigenvalues are 10.78,2.93,2.11,1.70 respectively for Chicago, and 
6.89, 1.24, 0.69, 0.52 for Hong Kong. Now we consider the dimension reduction with 
efficient dimension 2 although the estimation of the mimber of dimension needs 
further investigation. The estimates for the first two directions are given in Table 
4. 



Table 4: The estimated CS directions in Example 5.2 



City 


Direction 


N 


S 


P 


T 


H 


N*S 


N*P 


N*T 




/3i 


0.10 


-0.13 


-0.06 


-0.00 


-0.00 


0.06 


0.29 


0.19 


Chicago 


02 


-0.10 


-0.11 


0.39 


-0.25 


-0.07 


0.12 


-0.15 


0.09 




01 


0.32 


-0.15 


0.23 


0.10 


-0.41 


-0.07 


0.20 


0.42 


Hong Kong 


P2 


-0.04 


-0.08 


-0.12 


0.18 


0.19 


-0.21 


0.35 


0.17 


city 


Direction 


N*H 


S*P 


S*T 


S*H 


P*T 


P*H 


T*H 






01 


0.04 


-0.18 


0.27 


-0.01 


-0.06 


0.36 


0.77 




Chicago 


02 


-0.51 


0.46 


-0.20 


-0.21 


-0.15 


-0.16 


0.32 






01 


0.10 


0.01 


-0.05 


-0.31 


0.53 


0.12 


-0.14 




Hong Kong 


02 


-0.52 


-0.26 


-0.18 


0.42 


0.22 


-0.29 


-0.19 





The plots of Y against the two estimated directions are shown in Figure 6. 
The plots show strong similar patterns in the two separated cities. If we check 
the estimated coefficients (directions), NO2 and particulates (or their interaction) 
are the most important pollutants that affect the level of ozone. Temperature and 
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humidity and their interaction are the other important factors. The interactions of 
weather conditions with NO2 and particulates also contribute to the variation of 
ozone levels. These statistical conclusions give support to the chemical claim that 
ozone is formed by chemical reactions between reactive organic gases and oxides of 
nitrogen in the presence of sunlight; see the report of WHO (2003). 
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Figure 6: The estimation results for Example 5.2 using dMAVE. The upper two panels are 
the levels of ozone against the first two estimated CS directions in Hong Kong, the lower 
two panels are those in Chicago. 



6 Proofs 

6.1 Basic ideas of the proofs 

The basic idea to prove the theorems is based on the convergence of the algorithms 
and that the true dimension reduction space is the attractor of the algorithms. We 
here give a more detailed outline for the proof of Theorem 3.2. Suppose the estimate 
of -Bo in an iteration of the dMAVE algorithm is B(^^-^ . It follows from Step 2 that 

n _, 

k,j,i=l 
n 

X E P?kKhABl)X,,)xl!;i{H,4Y,) - afl - ^(i?of xgi}, (6.1) 
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where X^'^, is defined in the algorithm. By the decomposition in Step 3, we obtain 
estimate -6(^+1) in the next iteration. If the initial value is a consistent estimator 
of -Bo, by Lemmas 6.3, 6.4 and 6.5 below, we will obtain a recurring relation for the 
iterations as 

with \Qt\ < 1 and \Tn,t\ = o(l) almost surely when t > 1. Therefore, the dimension 
reduction space is an attractor in the algorithm. This recurring relation is then used 
to prove the convergence of the algorithm and the consistency of the final estimator. 
To ensure the convergence of the algorithm, we need to consider the consistency 
with probability 1. 

The details of the proofs are organized as follows. In section 6.2, we first list a 
series of lemmas. Lemmas 6.1-6.5. Based on these Lemmas the theorems are then 
proved. The proofs of Lemmas 6.1-6.5 are algebraic albeit complex calculations of 
Lemmas 6.6 and 6.7. They can be found in Xia (2006b) and are available upon 

request. Lemmas 6.6 and 6.7 are two basic results used in the proof dealing with 
the uniform consistency. Their proofs are given in section 6.3. 

6.2 Proofs of the theorems 

We first introduce a set of notations. Let e6,i(y) = Hi,(Yi — y) — E{Hf,(Yi — y)\Xi), 
Vy C M be a compact interior support of Y , i.e. for any v G , there exists ^ > 
such that '^'^^y:\y-v\<5 fyiy) > 0. Similarly, we can define a compact interior support 
for X. For B C {B : B^B = Iq}, define Sb = max{|S - Bq] : B e B}. For any 
index set Z and random matrix An{z), we say An{z) = 0{an\z E Z), ot An{z) = 
0{an) for simplicity, if sup^^2:\^n{z)\/an = 0{1) almost surely. As usual. An = 
Op{an) indicates that every term in An is 0(a„) in probability as n — ^ oo. Recall 
that Bo = (/3oi, /3o2, • • • , Poq) and B = {Pi, 02, • • • , Pq)- Let Hlf{x) = gb{Bjx, y) + 
vMBlx,y)B-^X,^, Hlf{x) = EU=iVlMB],x,yM XMX,,)/2 and 
H!'f{x) = Y.'U,r=i{vUr%{Bl^^y) {01X,,,){PIXMX,,)]/Q, where X,, = X,- 
X, \/gb{vi, ■ ■ ■ ,Vq,y) is defined in Section 2 and 

v'l,K9b{vu--- ,Vq,y) = ^^^^ gb{vi, - ■ ■ ,Vq,y) for l, k = 1,2, ■ ■ ■ ,q, 
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and \/\^r,i9b is defined naturally. By Taylor expansion of gb{BQXi,y) at B^x, it 
follows from model (2.1) that 

H,,,{y) = Hlf{x) + Hlf\x) + Hlf{x) + e,,,{y) + 0{\BlX,,t) (6.2) 

almost surely. Let 5„ih = (n/i™/ log n)~^/^, 5„ihb = (^^/i™b/ log n)^^/^ for any integer 
m, 4 = (n6/logn)-^/2, 5„ = (log n/n)^/^ ^j^^j ^^^^ = /j2 _^ ^4 _^ _^ ^^^^^^^ l^.^. j 

and /y be the density functions of X, X and Y respectively. Again, for simplicity, 
we write fsix), ^ib{x),wb{x) for fB{B^ x), ij,b{B^ x) and wb{B^x) respectively; see 
also the definitions in Section 3. Let c, cq, ci, • • • , be a sequences of positive constants, 
while c may have different values at different places. 

Lemma 6.1 [Kernel smoother in the first iteration] Let 

(cj - { t -y^.^) (;,:/,) ujr t ^^(^«) (;.:/,) 

Under assumptions (CI), (C2) and (C4), if /i ^ 0, 6 — > and nh^^'^b/ logn oo, 
then we have 

I 1 

axy=9b{Bj x,y) + -"^V^ngbiBo x,y)h^ + 0{h^ + 6phb\x eV^,y e V^), 

K=l 

n 

bxy=Bo\7 gb{Blx,y) + {l^2pnh^ fix)}'^"^ Kh{Xi^)Xix£b,i{y) 

i=l 

+0{h'^ + Sphb\x G G Py). 

Lemma 6.2 [Kernel smoother in dOPG] Define Vq = {D = i?diag(Ai, • • • , Xq)B^ + 
5diag(Ag+i, • • • , Xp)B^: {B, Bf{B, B) = Ip, a > min(Ai, ■ ■ ■ , Xq) > cq > 0, B e B 
and max(Ag+i, • • • , Xp)/h?' < e„}. Let 

.f(.,^„-i:A,(z,v.x..,(;jQj 

and 

(if) = {^^n (^)}"'E^'^(^'^'^^-)QJ^M(y)• 
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1 

a^y=9b{Bjx, y) + ^Yl "^IMBqx, y)h^ + 0{h? + 5qhb\x e V^,y e V^, D e Vg), 



Under assumptions (CI), (C2) and (C4), if n/i^+^b/log n oo, b 0,h 0, 
Ss/h — ^ and — ^ 0, then we have 

2 

bg = Bo{s7gbiBlx, y) + ^(/i^ + + e„)} + ^T^ol^^, y) 
where e^/j;, = /i^ + (/i^ + 6qh)6qhb + {h? + (J^/jbjen + (/i + Sqhb/h)SB and 

g n 
T=l i=l 

Lemma 6.3 [Kernel smoother in dMAVE] Let 



and 



Under assumptions (CI), (C2) and (C4), if nh%/\ogn — *■ oo, 6 — 0, /i — and 
Ss/h ^ 0, then 

2 

K=l 

+Vf„(x, y) + 0{h^ + + Ms + 4|x G P^, y G P^, 5 G S), 



1 

«fj/=56(-B^x, y) + \j^gb{Blx, y){Bo - Bfv^ {x) + y^^gbiB^x, y)h^ 



d^yh=S7gb{B])X, y)h + Mg{x, y)h^+ V^^ix, y) 

+0(^V V V + Mb + (5||x G , y e , B e B) , 

where 

Vf„(x, y) = {l + Mg{x, h)h}£^^, {x, y) + Mg{x, h)h£^^^{x, y), 

Vg{x, y) = Mg{x)hS^^,{x, y) + {l + Mg{x, h)h}£g^^{x, y), 

Mg{x),k = 1, 2, • • • , 5, are bounded continuous functions (details can be found in 
the proofs) and 

n 

£n,iix,y) = {nfB{x)}-'Y.KhiB''Xi,)eb,iiy), 

1=1 

n 

£n,2ix,y) = {nhfB{x)}-'Y,Kh{B^Xi,)B^Xi,eb,i{y). 

i=l 
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Lemma 6.4 [Denominator of dMAVE] Let pf^ = p{fB{Xj))p{fY{Yk)), where 

n n 

Mx) = KhiB'X,,), U{y) = n'^ ^ H,{Yi - y). 

i=l i=l 

Let Xf-f^ = df^®Xij where df^, = d^.y^. Suppose (C1)-(C4) hold and nhi+%/logn 
— oo, nb^/ log n ^ oo, b ^ 0,h ^ and Ss/h — 0. We have 

" -1 
{^"' E pfkKhiB'Xi^)Xf^,iXf^,)'] = (7,®S)Lf(7,®B>-2 + (7,®B)L2 

k,j,i=l 

+Ls{Ig B^) + + 0{{rghb + Sqhb)/h\B G B), 

where Li,L2 and L3 are constant matrices (details can be found in the proof) and 
Db = J pUb{x))p{U (y)) V 9b{Blx,y) ^'^gbiB^x, y) ® {vj^{x)vl{x)]f{x)f{y)dxdy. 

Lemma 6.5 [Numerator of dMAVE] Suppose conditions (C1)-(C4) hold. If 6 ^ 
0, /i — >■ 0, nh'^b/ logn — > 00, n5^/ logn — > 00 and Ss/h ^ 0, then 

n 

n''' J2 pfkKh{B'X,,)Xf^,{H,,{Yk) - af, - iiBoY Xf^,} = DB{m - i{B^)) 

k,j,i=l 

+^n{Bo) + Oih" + rghbSqhb + Slhb + ^l/b^ + i^qhb/h + h)dB\B e B}, 

where af,^ = a^^^^ , $„(Bo) = 0{5n + S^f^Jh) almost surely and $„(Bo) = Op(n-V2) 
with {Ig<^Bj)^n{Bo) = and ^/n^niBo) ^ A^(0, Sq), where So is given in Theorem 
3.2. 

Proof of Theorem 3.1 By Lemma 6.1, write 

n 

bxy = BoCn{x, y) + {l^2pnhlf{x)y'^ ^ Kh^ {Xix)Xi:c£b^^i{y) + BQO{hl + 6phobo), 

i=l 

where {Bq,Bq) is a p x p orthogonal matrix and Cn{x,y) = '\/gfj{Bj x,y) + ©(/ig + 
^ph^fio)- By Lemma 6.6, the second term on the right hand side above \sO{5ph^i)^/hQ)- 
It follows from step 2 in the dOPG algorithm that 

n 

^(1) = (-Bo, -Bo)C„(i?o, -Bo)^+ n"^^ {Sijk + Sjjf.) 

i,j,k=l 

+0{{hl + Sphobo)Sphobo/ho}, (6.3) 
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n 

Cn = n 

j,k=l 



where and p^-^^ are defined in the algorithm, Sijk = pfk {l^2phQf{Xj)} ^ 

Bo V 9b, {BlXj, Yk)Kh^ {Xij)Xlei,^^^,{Yk) and 

-2 (0) / '^n 

{Xj,Yk) \( Cn{Xj,Yu) V 

Z^^P^k \o{hl + 5,nobo)) \0{hl + 5,u,b,)) 
where K^n'^ = n-^Y.lk=ipfkCn{Xj,Yk)cl{X,,Yk). By Lemma 6.6, we have (y) = 

fAy) + f"iyK/2 + + \ \y e v^), = fix) + + <5p^, \x g p^). By 

the definition of /)(.), we have p^°^ = (y)) + O(rp/iofeja; eRP,y e M), 

where P6o(/.(y)) = (?/))+ P'(/.(y))/^(y)&i/2. Let 

x{p2phy{X,)}-^K^^iXij)Xleh^,i{Yk). 

By (C5) and Lemma 6.7, we have n-^^l^^^^^Sijk = 0{{Sn + (^^^^^ + 5^/5^)/^o}- 
Thus, 

n n 

E Sijk = n-' Sijk + 0{rpn,bMo^ho'} = oC&, (6.4) 

i,j,k=l i,j,k=l 

where 

aI^^ = (^n/^o + ^Ihob/ho + Sl/{blho) + h^ + rph^^b^Sph^^b^h^^ ■ 

By (C3) and the strong law of large numbers for U-statistics (cf. Hoeffding, 1961), 
= /p(/(a;))p(/y(y)) V 9bo{B^ x,y) v^5&o(^([ x,y)}fix) fAy)dxdy + o(l) al- 
most surely, which is of full rank asymptotically. Thus its eigenvalues are greater 
than a positive constant asymptotically. On the other hand, the eigenvalues of the 
lower right principal submatrix in C„ are of order A^^ Let A^^^^ > ... > Ap^^ be 
the eigenvalues of S(i) and fi^\ ■ ■ ■ , Pp''^ be the corresponding eigenvectors. By 
the interlacing theorem (cf. Ando, 1987), we have min{A^^\ ■ • • , Aq^^} > c and 
max{Aj^_Ji, • • • , Aj,^^} = 0(X^'^). By (6.3) and (6.4) we have 
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By Lemma 3.1 of Bai et al (1991), we have 

5(1)^5) -5oSo^ = 0((5«). (6.6) 

Let t = 1. Consider the {t + l)th iteration. Let £n\{x, y) = <?^o' (x, y) as defined in 
Lemma 6.2. By the conditions on bandwidths in (C5), wc have ei^) Ai^V/i? ^ 
and 5^V^i ~^ 0- -^y Lemma 6.2, similar to (6.3), we have from the algorithm 

n 

= (So, Bo)Cj^HBo, BoV + ^ {5« + (5«)^} + ^(e,,,;., (6-7) 

j,k=i 

where = p5|i?o{ V% {BlX„Yk) + ^(/i^ + <5,;,, + e^rt^)}{£^liX„Yk)y and 

r-W ^ ( 0{eghb) \ 

where A^*^ = ^^^^^^ pf^ y 56, (^J , n) V^56, (^J , n) + 0{hl + V* + e^*^ }• 
Note that i3j^f,W(^.^y^) = q, 4*J,(X,-,n) = 0(^*6*) and B-^ £^^l{X^,Yk) = 
OiSghtbtSs')- It follows that 

n 

„-2"^ /q(*) _l /'c(*)\T 



n 

{Bo, Bo) (So,5o)^n-3^{5j2 + (sj2)^}(5o,Bo) (So, So)" 



= (So,So) (,) '''^^)iBo,BoV + 0{dgh,,,d'i\ (6.8) 

whereC^l^ = n-^Y:lk=iPfk{'7gb,{BjXj,Y,) + 0{hl + SgH,+e^^^^^^^ 

Bo. Similar to pSJ, we have pf^^ = pft,+0{rqhtbt) where pJJ = p(/Bo(Xj)){p(/y (Ffe)) 

+p'(/y(n))/^(^fc)btV2}. By (C5) and Lemma 6.7, we have 



n 



i,fc=i 

=0((5„ + 5l^^^^ + 526-2 + Tgh^bAhtbt + 4^^<ih,b,). (6.9) 

By the strong law of large numbers for U-statistics, it follows A^*^ = Mo + o(l) 
almost surely, where Mq is defined in (C3). Let A^"*"^-* > ... > Ap"*"^^ be the eigen- 
values of ^{t+i) and B^^_^l^| the first q eigenvectors. By the same arguments as 
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for x!k\ it follows from (6.7), (6.8) and (6.9) that min{Af+^\ A^+^^j > c and 
max{Aj*^!]^\...,A{,*+^^} = 0{Ai*+^^}, where A^*"^^^ = eqUtht^qhtbt + ^lh,bt + Vt6t<^B ■ 
Considering en^^^ht-\-i Ai*'''^'*//it+i, there exists a constant ci, which does not 
depend on t, such that 

er^^,+i < cUx^l + Xt4^h + x^^g)}, (6.10) 

where Xo*n = {hi + hf6qhtbt+SqhtbAht)Sqhtbt/ht+l, Xi*n = {hf+6qhtbt)Sqhtbt/ihtht+l) 

and X2*l = Sqhtbt/ht+i- By (6.7) and (6.8), we write 

^(t+l) = -BoA^*^-Bo + BoC^2,n^O + -^o(C'i2,n-So)^ + 0{eqhtbt + <^q/Jt6t<^B''}) (S-H) 

where (5^2 n ^'^^ ^''^^ term on the right hand side of the first equation in (6.9). By 
the same arguments as for (6.6), we have -B(t+i)-Bj^^-) — BqB^ = 0{5qhtbt{^qhtbt + 
rqh^b,) + {hi + rqhtbt)e^n^ + {h + 5qhb/h)5^S +^n + ^l/hl + hi). That is 

<^r^<C2{xa + X?W)/., + xa<5g)} (6.12) 

for a constant C2 independent of t, where x^^^ = Sqhtbt{Sqhtbt+rqhtbt)+hf+6^/bf+Sn, 
X4*i = {ht+'rqhtbt)/ht and xiji = ht + ^qhtbt/h- Note that ht and 6t decreasing with 
i, by (C5) we have Sqhtbt/ht+i < ^qm/h — 0. It follows that en^"*^^ = \n^^^ /h'^.^i 
0, 5^+^^ = 0{rqh^bt) and ^ 0. Recursing (6.10) and (6.12), it follows 

that 

4°°^ = Oixtri + Xl?xi?} = 0{h' + Sq^iSq^ + + 6^) + <5^/6^ + Sr.} 

and en°°'* = 0{5qfit))- This is the first part of Theorem 3.1. By (6.11) and the 
equations above, write 

S(oo) = {Bo + Vn}^t\Bo + 7jny + 0{h^ + Sqm{dqm>+-b^) + Sl/d^}, 

where rjn = C'ig(A^°°V' = 0{h'' + 5q^{dq^ + 5^) + S^/d^ + Sn}. Note that 
^(L)^B(^) (^) = and thus Bj^^rin = 0. We have A„ =^ (Sq + VnV{Bo + ??n) = 
Iq + 0{dl). Let = {-Bo + r]n}A.n^^^. It follows that 

S(oo) = %Ai°°^»7n +C?{^^ + '^.?^('^.m,+5^)+^n/5'}■ 
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Let BdOPG be the first q eigenvectors of S(oo)- By Lemma 3.1 of Bai et al (1991), 
we have 

BdOPoBjoPG - BoB^ = BoVn + VnB^ + + 6gm{Sgm + 6l/d^}. (6.13) 
By Lemma 6.7 and (C5), we have 

n 

vn = n-2^p(/Bo(^i))/>(/.(n))4:^^(^i,n)v^56(So^,-,n)(Ai°°^)-' 
j,k=i 

n 

= n-i^p(/Bo(X,))p(/,(y,))<(^0^.„(^0C7(Ai°°))-' + 0{r,^V}, 

i=l 

where Q = ^g^{B'^Xi,Yi)U{Yi) - E{^gt,{B'^Xi,Yi)U{Yi)\B'^Xi}. Let Q = 
^fiYi\BlXi)U{Yi) -E{s7fiYi\BlXi)U{Yi)\B'^Xi}. As 6 ^ 0, we have Ai°°) ^ 
Mo almost surely, where Mq is defined in (C3). By calculating the mean and co- 
variance matrix, we have 

n 
i=l 

It follows from the two equations above and the conditions in the Theorem for the 
bandwidths 

//, 

Vn = n-'Y,pifBo{Xi))p{fAm^to(^i>BMi)ClM^' + op{n-'/^). (6.14) 

i=l 

After vectorizing r/„, the second part of Theorem 3.1 follows from (6.13), (6.14) and 
the central limit theorem. □ 
Proof of Theorem 3.2 Consider the initial estimator B^^i^ in (6.6). Let Q = 
B^^^Bq. For simplicity, we assume Q = Iq, otherwise, we may use basis BqQ and 
consider the expansion in Lemmas 6.3, 6.4 and 6.5 at {BqQY^x. Let 5^^ be the 
consistency rate of the estimator in the t'\h iteration. Write ^(Sq) = {Iq (8) Bo)£{Iq). 
By the definition of in Lemma 6.4, it follows 

{Iq ® B)'^ Db = 0, Iq®B = Iq®BQ + 0{8B), {Iq ® Bq^ ^^{Bq) = Q. (6.15) 

By the definition of the Moore-Penrose inverse we have D~^Db = Iq®{BB^), where 
{B,B) is a p X p orthogonal matrix. By Lemmas 6.4, 6.5 and (6.1), for every 
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in B = {B:\B- Bo\ < ^g^}, if s''^' /ht ^ we have 

b(*+l) = (/^0 5o){W+O(4*^)} + ^*(t)W5(t))-W} + ^^J)^n(5o) 

+0{At + (ht + Sgh,bJht)~S^S}^ (6-16) 

where = hf + {hj + bi + 5,^,6,) + ^l/b^, ^1*^ = {A^ + {d,h,bjht + ht)s'i>}/hl 
D^t) = Db^^) and ^^t) = Iq® (-^(t)-^(T)) = * + > where * = Jq (g) (SoSj") is a 
projection matrix and (5o, Bq) is a p x p orthogonal matrix. We have 

A1(b(*+i)) ^ 5oAW + l>i(^-{£(S(,))-£(5o)}) + ^A^(i^J)$n(5o)) 
where A^*^ = + 0(cn'*) and M.{.) is defined in section 2.2. Note that 

where 5n = 5n + If c4*^ = almost surely, then by Step 3 

= So + - £(i?o)}) + \m{D+^^^{B^)) 

= Bo + ^Min^iB^t)) - KBo)}) + 0{Sn + At + {ht + Sgh,bJht)~6^S}- (6-17) 

By (C5) and (6.6), we have Sqh^b^/hf < Sqf^/h^ — 0, /h\ and dk^ 
almost surely. Thus (6.17) holds for t = 1. By assumption (C5), it follows that 
^bV^2 = o(l) and Cn^ = o(l) almost surely. Thus (6.17) holds for t = 2. Recurring 
the formula, we have 

A more detailed deduction was given in Xia, Tong and Li (2002). Therefore, the 
first part of Theorem 3.2 follows immediately. By the first equation of (6.17) with 
t = oo and Lemma 6.5, we have 

Bioo)-Bo = ^A^(*{£(S(oo))-^(i?o)}) + ^A^(l?Jo)^-(^o)) 

+Op{h'' + {h" +t>'' + 6gf^)dgra,}. 
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Multiplying both sides by Bq, by (6.15) we have 

bJb^^^-i = Op{h'' + {h^+d^ + 5gf^)6gf^}. 

It follows that 

^{oo)^M^o-Bo = ^Mm^{B^oo))-£{Bo)}) + lM{D+^^MBo)) 

+Op{fi' + {h''+d'' + Sgr^)dq^}. 

Note that ^i^J^) = D+^^ + Op {6^^^). We have 

£(i?(^)i?J,)i?o) - ^(^o) = D+^^^niBo) + Op{h^ + {h'' + 5gnb)Sgm}- 

This is the second part of Theorem 3.2. □ 

6.3 Auxiliaries 

Lemma 6.6 Suppose m„(x,Z),n = 1,2, • • • , are measurable functions of Z with 
index x ^ ^"^^ where d is an integer, such that (I) |m„(x, -2^)1 < M{Z) with 
E{M''{Z)) < oo for some r > 2; (II) sup^ £^|mn(x, < and (III) |m„(x,^) - 
m„(x',^)| < |x-x'l"'""'G(Z) with some ai,a2 > and £"10(^)1 < oo. Suppose 
{Zi,i = 1, • • • , n} is a random sample from Z. If a„ = cn~^ with < S < 1 — 2/r 
and c > 0, then for any positive ao we have 

n 

' Y^imnix, Zi) - Erunix, Z,)} = 0{(a„ log n/n)^/2| 



n 

1=1 



sup 

|x|<n«o 

almost surely. 

Proof of Lemma 6.6 The "continuity argument" approach is used here. See, 
e.g. Mack and Silverman (1982) and Hardle et al (1993). Note that P„ {|x| < 
n°o} is bounded and its Borel measure is less than cin""'^ for some constant ci. 
There are n"^* (04 > a^d + (1 + a2)dla\) balls Bn^ centered at Xn^j ^ < k < n°'*, 
with diameter less than C2n~(-'^+"^2)/ai^ g^ch ^Jia^ Ui<jt<„a4 -Bnfc- It follows that 

sup \-y2{mn{x,Zi) - Emn{x,Zi)} 
I 1 " 

l<K<n™4 I n 

1=1 
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1 " 

+ max sup - V {mn(x,^i) - "T'„(xnfe,^i)} 

-E{mn{x, Zi) - mn{Xnk,Zi)} 
def 

= max \Rn,k,i\+ max sup \Rn,k,2\- (6.18) 

l<fc<n"4 l<A;<n«4 -^gB„^ 

By condition (III) and the definition of -B„^, we have 

, max sup \mn{x,Zi)-mn{Xnk,Zi)\ < max sup n"^!;^-;^ |°iG(Zi) 

< C3n-^G{Zi). 

By the strong law of large numbers, we have 



max sup \Rn,k,2\<C4n-^y2{G{Zi)+EG{Zi)} = 0{n-^) (6.19) 

almost surely. Let = (na„/ log r^)^/^ m^(xn^, .Zj) = runixuk^ Zi)I{\M{Zi)\ > r„} 
and m^(xnfe, -^i) = mn(xnfc, -^i) - m°(xnfc, -^i)- Write 

^.^.1 = - E [<(Xn„ ^0 - E{m"^{xn„Zi)}] in„i, (6.20) 

i=l i=l 

where ^n^,j = mf^^Xuk^Zi) - E{mj^{xnk^ ^i)}. By the truncation, it follows that 

E\m"^{Xn„Zi)\ <T-^+'E\M{Z,W. 
If a„ = cn""^ with < (5 < 1 — 2/r, we have 

n 

n-i| ^i?m°(xn„^i)| < E\M{Z^)\^T-^+' = o({a„ log(n)/n}V2). (6.21) 

i=l 

Again by the truncation, we have 

J2 \<{xn„Zi)\ < J2 mzmmzi)\ > r„) < r^+^x; |M(z,)r/(iM(z,)i > r„). 

i=l i=l 1=1 

For fixed T, by the strong law of large numbers, we have 



n 

-1 

n 

1=1 



^ \M{Z,)\^I{\M{Z,)\ > T) ^ E{|M(Zi)ri(|M(Zi)| > T)} 
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almost surely. The right hand side above is dominated by E{\M{Zi)\^} and — as 
T — oo. Note that increase to oo with n. For large n such that r„ > T, we have 



n 

def _i 



Cn =' J2 \M{ZiWl{\M{Zi)\ > Tn) < n"! \M {ZiW I {\M (Z^)] >T)^0 



i=l i=l 

almost surely as T — > cxd. It follows 



l<k<n°'i ' 
1=1 

almost surely. By condition (II), we have 

n 

max Var(>^£„. j) < n max E{ml,{~Xnu, 



l<fe<n"4 l<fe<n"4 



< n max E{mn{Xnk, ^i)} < c^nan = Ni. (6.23) 

l<fc<n"4 

By the condition on a„ and the definition of ^n^.i; we have 

max < ceT^ = ceinaj log n) ^2 N2. (6.24) 

l<fc<n™ 

Let iV3 = C7(na„logn)-'^/2 -^^i^h Cy > 2(0:4 + 2)(c5 + cec/). By the Bernstein's 
inequality (cf. DE LA Pena, 1999), we have from (6.23) and (6.24) that 

P(lE?„.,.l>iV3) < 2exp(5^^^^-^j 

< 2exp{-c?logn/(2c5 + 2c6C7)} 

It follows that 

oc n 00 n 

E P^(,<T<\^.. I E ^--^1 > ^3) < E n- ^ max ^ Pr(| ^n,,! > ATa) < 00. (6.25) 

n=l i=l n=l i=l 

By the Borel-Cantelli lemma (cf. Chow and Teicher, 1978, p.60), we have 

n 

,i?|^«jEW^I = ^TO (6.26) 

i=l 

almost surely. Combining (6.20), (6.21), (6.22) and (6.26), we have 



max \Rn,k,i\ = 0{(a„log(n)/n)V2} (6.27) 
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almost surely. Lemma 6.6 follows from (6.18), (6.19) and (6.27). □ 
For any function G{Xi,Yi,Xj, Yj,Xk,Yk) (or G{Xj, Yj,Xk,Yk)), we introduce 
a projection operator Ej. as follows. 

EkG{Xi, Yi, Xj,Yj,Xk, Yfe) = E{G{Xi, Yi, Xj,Yj,Xk, Yk)\Xi, Yi, Xj,Yj}. 

Lemma 6.7 Let A = {A : A = /«} with 1 < k < p. Suppose go{y), gi{x), g2{x) 
are bounded continuous functions. If conditions (C2) and (C4) hold with B replaced 
by A for all A e A, then 

n 

^ Kh{A^Xij)gi{Xi)g2iXj)goiYk)vgb{B^Xj,Yk)eb,iiYk) 

i,j,k=l 

n 

= n-' EjEk{Kh{A^ Xij) y gb{B^ Xj ,Yk)eb,i{Yk)} + 0{<i,hb\A G A), 
1=1 

where <jK/ib = 6^h'''^b~'^ + (5^^^ + and the first term on the right hand side is 

Proof of Lemma 6.7 For easy of exposition, we consider gk = l,k = 0,1,2 
only. Let An{A) be the left hand side of the equation in the lemma. Let iPi^{s) = 
f c^p{is^u)K{u)du and ^Pjj{t) = {2tt)^^ f cxp{itv)H{v)dv be the Fourier 
transformations, where i is the imaginary unit. It follows from the inverse Fourier 
transformation that gb{u,y) = b'^ J if„{t')e-'^'y'^E{e'*'^/\ X = u}dt' . Thus 

Vgf,iBjXj,Yk) = b-^ I <pJt')vUBjXj)e-'''^'^/''dt', (6.28) 

where V9b{u) = dE{e'^'^/''\B^ X = u)/du. We have 

\ r " 

An(^) = ^^JV'At') Y i^h{A'^^^)'^9b{B^^j)-Ej[Kn{A'^Xi,) 

i,j,k=l 

X V ~gb{BjX,)]}{ei„{Yk)e-''''''^/' - £;fcK,(n)e-'*'^'=/^]}dt' 
I "^"^^'^ ^ Ej[K^{A'^Xij)v~g,{BjXj)] 

i,k=l 

x{e6,.(n)e-'*'^'=/^ - i^fch,.(n)e-'*'^'=/^]}di' 
^^b f ^-^^'^ ^ EkMYk)e-^''''>'l']{Kt,{A'X,,)y~g,{BlX,) 

-Ej[Kh{A^Xi,)^UB'^X^)]}dt' 
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1 r " 

1=1 

An,i{A) + An,2{A) + A„,3(^) + A„,4(A). (6.29) 



By the inverse Fourier transformation, it follows that Kh{A^Xij) = h J fii{s) 
^-is-^A^Xii/h^^ and Hb{Yi - Yk) = h'^ J ip„(t)e-<^'-^''^/^dt . Thus 



where 

m,,n{A, s, t, t', X„ ¥,) = e-"^"^-/'' y gb{B^X,) - £;[e-^^^^^/'» y 9b{BjXi)], 
m2,n{A, s, t, t', X,, Yi) = e^(t~m/^ _ E[e^(t-m/b^ 

and 

ms,n{A,s,t,t',Xi,Yi) = e-^'^^/' - E{e-''^^/'\Xi). 

By (C2), we have that | y gb{u)\ < / | y fo{y\u)\dy is bounded. For any r > 2, 
it follows that sup^/ E{\y'gi,{BQ Xi)Y < c and that 

sup ^ E\me,,,{A, s, t, t', Xi, < c, £=l,2, 3, 

where c is a finite constant. For any ao > 0, let = {{t,t',s) : \t\ < n°'° , \t'\ < 
n°'o, \s\ < n"o}. By taking % = {A,t,t',s) and a„ = c, we have from Lemma 6.6 

n 

sup n-^\y^me,n{A,s,t,t',Xi,Yi) = 0(5„), £=1,2,3 (6.30) 
almost surely. On the other hand, \m£^n{A, s,t,t' , Xi,Yi)\ is bounded. Thus, 

n 

sup n-^\^me,n{A,s,t,t',Xi,Yi) =0(1), £=1,2,3. (6.31) 
AeA{t,t',s) ' i=i 

By (C4), the Fourier transformation functions and iPu{.) are absolutely inte- 

grable; see Chung (p. 166, 1968). We can choose ao such that 

/ \vAs)\ds = 0{5l), I \^„{t)\dt<0{Sl). (6.32) 
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Partition the integration region in A„^i(A) into two parts, we have from (6.30)- (6.32) 
that 



sup 
AeA 



X |(^^ {s)ipH {t)<fH {t')\dsdtdt' 

X \tpj^ {s)ipjj {t)ipu {t')\dsdtdt' 
= (h'^b^r'Oid^^) I \ipAs)^,{t)ipAt')\dsdtdt' 

+{h%'')-'0{l) [ \^,{s)^,{t)^,{t')\dsdtdt' 

= 0{5lh-%-'^) (6.33) 

almost surely. Let g{Xi) = Ej[Kyi[A^ Xij) y 9b{BQ Xj)\. It is easy to see that 
g{Xi) = 0(1) almost surely. Applying the inverse Fourier transformation to £6,i(lfe) 
and using similar arguments leading to (6.33), we have 

sup A„,2(^) =0{5lb-'') (6.34) 

almost surely. Applying the inverse Fourier transformation to Kh{A^ Xij), similar 
to (6.33) we have 

sup A„,3(^) = 0{5lh-''h-^) (6.35) 

AeA 

almost surely. By (6.28), we have 

n 

A„,4(A) = n-^Y.^3Ek{Kh{A^ Xij) y gb{B^ Xj,Yk)eb^i{Yk)}. 



i=l 



By Lemma 6.6, we have 



sup A„,4(^) = 0{5n) 

AeA 



(6.36) 



almost surely. Finally, Lemma 6.7 follows from (6.33)-(6.36) and (6.29). □ 
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